Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Oaxaca - Unexplained Split Calculation

    Hello,

    I'm running the "split" function for the oaxaca command to assess how much of the unexplained portion of the wage gap between two groups is due to the "in favor of group 1" vs. "against group 2". I am running this with different pairs of groups and the results that keep coming up shows there's a negative portion for group 1 (e.g., unexplained1 = -7.91) and a positive portion for group 2 (e.g., unexplained2 = 15.83). My understanding is that this output indicates there's discrimination against group 1 and against group 2, correct? However, in every scenario I run, it doesn't make much sense that there should be discrimination against group 1 (the setup is so that group 1 is always the protoypical "good" employee). My question is: does anyone know how the "split" function is calculated (or more precisely, how beta asterisk is being calculated in the split oaxaca command in Stata?) I've already looked at the oaxaca.ado file but couldn't figure it out. Any help on this would be much appreciated!


  • #2
    Eliza, I'll explain what "split" shows with the following example.

    Code:
    Group 1: Y1 = X1*B1
    Group 2: Y2 = X2*B2
    Pooled sample: Y = X*B
    
    Y1 - Y2 = (X1 - X2)*B + X1*(B1 - B) + X2*(B - B2)
    The last line shows the decomposition result if the "pool" option is specified. The green term is the explained part, the blue term is the unexplained part for group 1, and the red term is the unexplained part for group 2. As B is the weighted average of B1 and B2, the blue and red terms should have the same signs for each single variable (Signs may differ if effects from all variables are summed up.). A comprehensive display of your code and results may be helpful for further examination.
    Last edited by Fei Wang; 14 Jun 2022, 20:44.

    Comment


    • #3
      Thank you for the reply, Fei. Here's an example using a database that has data from a 2013 sample of employed Hispanic workers in metropolitan Chicago. For illustration purposes, I'm using "realwage" as my Y var and "age" as my X var, with the groups being (foreignborn = 0 or 1). Below, I'm pasting the Stata output from the analyses, where I "back-out" Beta asterisk (or B in the formula you used) for the 3 terms (explained, unexplained1 and unexplained2) by using the formula and Stata's output for each term. As you can see, the Beta asterisk/B value is different for each term, which makes me think the Stata command is multiplying by some weight to arrive at it. My question is relating to that calculation, or how is the "split" function is calculating Beta asterisk to get to these 3 terms.

      HTML Code:
      [CODE]
      . table foreignborn, c(mean realwage) format(%9.6f)
      
      --------------------------
      foreign.b |
      orn       | mean(realwage)
      ----------+---------------
              0 |      17.582823
              1 |      14.567248
      --------------------------
      
      . table foreignborn, c(mean age) format(%9.6f)
      
      ----------------------
      foreign.b |
      orn       |  mean(age)
      ----------+-----------
              0 |  34.247490
              1 |  40.813560
      ----------------------
      
      . 
      . //
      . oaxaca realwage age, by(foreignborn) pooled split 
      
      Blinder-Oaxaca decomposition                    Number of obs     =        666
                                                        Model           =     linear
      Group 1: foreignborn = 0                          N of obs 1      =        287
      Group 2: foreignborn = 1                          N of obs 2      =        379
      
      ------------------------------------------------------------------------------
                   |               Robust
          realwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
      overall      |
           group_1 |   17.58282   .7105737    24.74   0.000     16.19012    18.97552
           group_2 |   14.56725   .4573414    31.85   0.000     13.67088    15.46362
        difference |   3.015574   .8450303     3.57   0.000     1.359345    4.671803
         explained |  -1.157924   .2760453    -4.19   0.000    -1.698963   -.6168854
      unexplained1 |   7.11e-15   .1366305     0.00   1.000    -.2677908    .2677908
      unexplained2 |   4.173499   .9135592     4.57   0.000     2.382955    5.964042
      -------------+----------------------------------------------------------------
      explained    |
               age |  -1.157924   .2760453    -4.19   0.000    -1.698963   -.6168854
      -------------+----------------------------------------------------------------
      unexplained1 |
               age |   4.380161   1.439886     3.04   0.002     1.558035    7.202286
             _cons |  -4.380161    1.41094    -3.10   0.002    -7.145551    -1.61477
      -------------+----------------------------------------------------------------
      unexplained2 |
               age |   4.112639   1.254368     3.28   0.001     1.654123    6.571155
             _cons |   .0608593   1.131344     0.05   0.957    -2.156535    2.278254
      ------------------------------------------------------------------------------
      
      . reg realwage age if foreignborn == 0 
      
            Source |       SS           df       MS      Number of obs   =       287
      -------------+----------------------------------   F(1, 285)       =     32.59
             Model |  4260.62796         1  4260.62796   Prob > F        =    0.0000
          Residual |  37257.7954       285  130.729107   R-squared       =    0.1026
      -------------+----------------------------------   Adj R-squared   =    0.0995
             Total |  41518.4234       286  145.169312   Root MSE        =    11.434
      
      ------------------------------------------------------------------------------
          realwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               age |   .3035583    .053173     5.71   0.000     .1988966      .40822
             _cons |   7.258667   1.930273     3.76   0.000     3.459267    11.05807
      ------------------------------------------------------------------------------
      
      . reg realwage age if foreignborn == 1 
      
            Source |       SS           df       MS      Number of obs   =       379
      -------------+----------------------------------   F(1, 377)       =      4.04
             Model |  318.397693         1  318.397693   Prob > F        =    0.0450
          Residual |  29680.2426       377  78.7274338   R-squared       =    0.0106
      -------------+----------------------------------   Adj R-squared   =    0.0080
             Total |  29998.6403       378  79.3614821   Root MSE        =    8.8728
      
      ------------------------------------------------------------------------------
          realwage |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      -------------+----------------------------------------------------------------
               age |   .0735626   .0365793     2.01   0.045     .0016376    .1454875
             _cons |   11.57797   1.554735     7.45   0.000     8.520929    14.63501
      ------------------------------------------------------------------------------
      
      . 
      . // Formula: Y1 - Y2 = (X1 - X2)*B + X1*(B1 - B) + X2*(B - B2)
      . // Explained: -1.157924 = (34.25-40.81)*B, solving for B: 0.176349628
      . // Unexplained 1 (in favor of Group 1): 7.11E-15 = 34.25*(0.3035583 - B), solving for B = 
      > 0.3035583
      . // Unexplained 2 (against Group 2): 4.173499 = 40.81*(B - 0.0735626), solving for B = 0.17
      > 5820257
      [/CODE]

      I'd very grateful for any help with this!

      Comment


      • #4
        Eliza, you inferred B in an incomplete way, as the total unexplained portion includes the constant term as well. If we only focus on "age", then it would be

        Code:
        Explained: -1.157924 = (34.247490 - 40.813560) * B, solving for B = 0.17634963
        Unexplained 1: 4.380161 = 34.247490 * (0.3035583 - B), solving for B = 0.17566101
        Unexplained 2: 4.112639 = 40.813560 * (B - 0.0735626), solving for B = 0.17432909
        B is actually the coefficient of age in the full sample regression. You may easily obtain its value by regressing realwage on constant and age for all observations, or simply by adding the option "noisily" to the oaxaca command. The calculation of B above is not identical among the three equations, probably due to rounding error, or because you didin't restrict the sample to the regression sample while computing the average of age by foreignborn.

        Comment


        • #5
          I see - thank you, Fei! One follow-up: How does one regress the DV on the constant and the IV as you suggest in Stata? I've searched for a command like this, but didn't find it. If I simply run the full sample (reg realwage age), the coefficient comes out as 0.1345352, so I'm guessing including the constant as you mentioned will hopefully get to the 0.17 B value.

          Comment


          • #6
            Eliza, I missed one part in #4 -- It should be regressing, with the full sample, realwage on constant, age, and foreignborn. Oaxaca + noisily would display the correct full-sample regression.

            Comment


            • #7
              Thank you, Fei - I really appreciate your help!

              Comment


              • #8
                Originally posted by Eliza Schmidt View Post
                Thank you for the reply, Fei. Here's an example using a database that has data from a 2013 sample of employed Hispanic workers in metropolitan Chicago. For illustration purposes, I'm using "realwage" as my Y var and "age" as my X var, with the groups being (foreignborn = 0 or 1). Below, I'm pasting the Stata output from the analyses, where I "back-out" Beta asterisk (or B in the formula you used) for the 3 terms (explained, unexplained1 and unexplained2) by using the formula and Stata's output for each term. As you can see, the Beta asterisk/B value is different for each term, which makes me think the Stata command is multiplying by some weight to arrive at it. My question is relating to that calculation, or how is the "split" function is calculating Beta asterisk to get to these 3 terms.

                HTML Code:
                [CODE]
                . table foreignborn, c(mean realwage) format(%9.6f)
                
                --------------------------
                foreign.b |
                orn | mean(realwage)
                ----------+---------------
                0 | 17.582823
                1 | 14.567248
                --------------------------
                
                . table foreignborn, c(mean age) format(%9.6f)
                
                ----------------------
                foreign.b |
                orn | mean(age)
                ----------+-----------
                0 | 34.247490
                1 | 40.813560
                ----------------------
                
                .
                . //
                . oaxaca realwage age, by(foreignborn) pooled split
                
                Blinder-Oaxaca decomposition Number of obs = 666
                Model = linear
                Group 1: foreignborn = 0 N of obs 1 = 287
                Group 2: foreignborn = 1 N of obs 2 = 379
                
                ------------------------------------------------------------------------------
                | Robust
                realwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                overall |
                group_1 | 17.58282 .7105737 24.74 0.000 16.19012 18.97552
                group_2 | 14.56725 .4573414 31.85 0.000 13.67088 15.46362
                difference | 3.015574 .8450303 3.57 0.000 1.359345 4.671803
                explained | -1.157924 .2760453 -4.19 0.000 -1.698963 -.6168854
                unexplained1 | 7.11e-15 .1366305 0.00 1.000 -.2677908 .2677908
                unexplained2 | 4.173499 .9135592 4.57 0.000 2.382955 5.964042
                -------------+----------------------------------------------------------------
                explained |
                age | -1.157924 .2760453 -4.19 0.000 -1.698963 -.6168854
                -------------+----------------------------------------------------------------
                unexplained1 |
                age | 4.380161 1.439886 3.04 0.002 1.558035 7.202286
                _cons | -4.380161 1.41094 -3.10 0.002 -7.145551 -1.61477
                -------------+----------------------------------------------------------------
                unexplained2 |
                age | 4.112639 1.254368 3.28 0.001 1.654123 6.571155
                _cons | .0608593 1.131344 0.05 0.957 -2.156535 2.278254
                ------------------------------------------------------------------------------
                
                . reg realwage age if foreignborn == 0
                
                Source | SS df MS Number of obs = 287
                -------------+---------------------------------- F(1, 285) = 32.59
                Model | 4260.62796 1 4260.62796 Prob > F = 0.0000
                Residual | 37257.7954 285 130.729107 R-squared = 0.1026
                -------------+---------------------------------- Adj R-squared = 0.0995
                Total | 41518.4234 286 145.169312 Root MSE = 11.434
                
                ------------------------------------------------------------------------------
                realwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                age | .3035583 .053173 5.71 0.000 .1988966 .40822
                _cons | 7.258667 1.930273 3.76 0.000 3.459267 11.05807
                ------------------------------------------------------------------------------
                
                . reg realwage age if foreignborn == 1
                
                Source | SS df MS Number of obs = 379
                -------------+---------------------------------- F(1, 377) = 4.04
                Model | 318.397693 1 318.397693 Prob > F = 0.0450
                Residual | 29680.2426 377 78.7274338 R-squared = 0.0106
                -------------+---------------------------------- Adj R-squared = 0.0080
                Total | 29998.6403 378 79.3614821 Root MSE = 8.8728
                
                ------------------------------------------------------------------------------
                realwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
                -------------+----------------------------------------------------------------
                age | .0735626 .0365793 2.01 0.045 .0016376 .1454875
                _cons | 11.57797 1.554735 7.45 0.000 8.520929 14.63501
                ------------------------------------------------------------------------------
                
                .
                . // Formula: Y1 - Y2 = (X1 - X2)*B + X1*(B1 - B) + X2*(B - B2)
                . // Explained: -1.157924 = (34.25-40.81)*B, solving for B: 0.176349628
                . // Unexplained 1 (in favor of Group 1): 7.11E-15 = 34.25*(0.3035583 - B), solving for B =
                > 0.3035583
                . // Unexplained 2 (against Group 2): 4.173499 = 40.81*(B - 0.0735626), solving for B = 0.17
                > 5820257
                [/CODE]

                I'd very grateful for any help with this!
                Hola me podrias compartir un script de como aplica el modelo de descomposición de oaxaca-blinder

                Comment

                Working...
                X