Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with a counterfactual analysis of the Theil-index decomposition

    Hi Statalist,
    I'm writing a paper concerning changes in in wage-inequality over time in relation to educational homogamy(i.e the tendency to marry someone with the same education as you). To analyze this I'm using the theil-index to do a decomposition/counterfactual analysis of changes in wage-inequality. My problem is that while the main counterfactual Theil-value is correct the counterfactual between/within values are not.

    The Theil-index is decomposable into within & between group inequality which is very useful for my purpose as I am primarily interested in the inequality between groups of different educational combinations. To calculate the Theil-value you need three components, p=group distributions, x=mean income and y=within group inequality. My intention is to perform this decomposition on data from 2000 & 2010 and then combine components from both years as to create a counterfactual T-value. This allows me to see what would have happened to wage inequality if only p, distribution of groups, changed between 2000 & 2010. To perform the original decomposition i use the INEQDECO module by SSC.

    So far I have managed to produce a counterfactual Theil-value however, I have not been able to produce counterfactual within/between group T-values. the code I use is below.


    Code:
    ineqdeco wage00, by(educ1)
    foreach x of numlist 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 {
    local v_`x' =  r(v_`x')
    }
    
    ineqdeco wage10, by(educ2)
    foreach x of numlist 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 {
    local GE_alpha1_`x' = r(ge1_`x')
    }
    
    foreach x of numlist 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 {
    local s_`x' =  r(lambda_`x')
    }
    
    foreach num of numlist 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 {
    local my_GE`num' = `v_`num'' * log(`s_`num'')
    }
    
    local my_GEglobal =`my_GE1'+`my_GE2'+`my_GE3'+`my_GE4'+`my_GE5'+`my_GE6'+`my_GE7'+`my_GE8'+`my_GE9'+`my_GE10'+`my_GE11'+`my_GE12'+`my_GE13'+`my_GE14'+`my_GE15'
    
    display `my_GEglobal'
    where s_`x' = mean income and v_`x' = distribution of households and GE_k =subgroup indices

    In the last "foreach loop" i put in any of the codes posted below as to produce the counterfactual T/between/within values. I know that the first line works because I have tested it using data from only 2000, which produces the same T-value as the standard decomposition.

    However, when i try to use the code to produce the counterfactual between-group inequality i always get a negative value, as the Theil-index can't go below 0 i know this to be incorrect. And for within, when i try to apply it on data from only one year i get a different value from when i perform the ordinary decomposition using Ineqdeco. So I know that the counterfactual main-T value is correct but the other two are incorrect and I can not figure out why.

    Code:
    local my_GE`num' =`s_`num'' * local my_GE`num' = `GE_alpha1_`num'' + `v_`num'' * `s_`num''* log(`s_`num'') // Theil value
    
    local my_GE`num' =`GE_alpha1_`num'' + `v_`num'' * log(`s_`num'')       // Calculates within-group Theil value
    
    local my_GE`num' =`v_`num'' * log(`s_`num'')        // Calculates between-group Theil value
    I should add that i did not write this code, I got it form a researcher I contacted who has worked in a similar field. I am not really used to working with stata/quantitative topics so I am a bit out of my depth here. I apologize if I have broken any statalist netikett. I appreciate any help or input, thanks in advance.

  • #2
    Check carefully that you are using the correct formulae to generate the Within-Group and Between-Group inequality components. (Look at the formulae in the help-file.) At first glance -- I don't have time to check in detail -- it appears that some of your code may implement the wrong formulae. (For instance, Within-Group inequality in the case of the Theil (GE(1)) index is the sum of the products of group income-shares and group inequalities. You appear to be using the group population shares, not the group income shares, in "myGE ...".)

    Comment


    • #3
      Originally posted by Stephen Jenkins View Post
      Check carefully that you are using the correct formulae to generate the Within-Group and Between-Group inequality components. (Look at the formulae in the help-file.) At first glance -- I don't have time to check in detail -- it appears that some of your code may implement the wrong formulae. (For instance, Within-Group inequality in the case of the Theil (GE(1)) index is the sum of the products of group income-shares and group inequalities. You appear to be using the group population shares, not the group income shares, in "myGE ...".)
      Thank you so much for the suggestion, I've looked through the helpfile and my formulae was indeed wrong it seems. I tried the formulae below and it works when I apply it to data from 1 year only, where I can double check the result with Ineqdeco. However, when i use it for my counterfactual analysis and I hold everything but mean wage (s_`x' ) constant I get a negative value for the between-group T-value.

      Could the problem be that i calculate within-inequality manually instead of using the macro GE_W(a)? In the literature using this method(Breen & Andersen, 2012) they use mean income, group distribution and within-group inequality, rather than subgroup dices(GE_K). although I am unsure how exactly they perform the counterfactual analysis.

      Code:
      local my_GE`num' = `v_`num'' * `s_`num'' * `GE_alpha1_`num'' + `v_`num'' * `s_`num''* log(`s_`num'')        // Theil
      
      local my_GE`num' = `v_`num'' * `s_`num'' * `GE_alpha1_`num''                            // Within
      
      local my_GE`num' = `v_`num'' * `s_`num''* log(`s_`num'')                      // Between
      Thanks again for your help.

      I reiterate that any comment or input is helpful!


      References:
      Breen, R., & Andersen, S. H. (2012). Educational assortative mating and income inequality in Denmark. Demography, 49, 867–887.

      Comment


      • #4
        As Breen & Andersen point out, overall inequality measured by the Theil index can written in terms of the subgroup inequalities (Theil indices), the subgroup population shares and the subgroup means. The subgroup population shares and means together define the subgroup income shares. [This is long well-known; it's how the decomposition theorists developed their results. See also Jenkins (1995) cited in the ineqdeco help file for an application to UK inequality trends.]

        ineqdeco provides you with all those elements, and saves them in r() -- as you know. To do Breen-Andersen type counterfactuals, you need to run ineqdeco once for each of the 2 years being compared. Ensure that the local macros you create from the stored results are named so that you can identify the relevant distributional element (mean, inequality, pop share, etc.), the group id, and the year id. With all these elements in memory, you should be able to do your counterfactuals. Match your elements with the corresponding ones in Breen-Andersen's equations 4 and 5a-5c

        [This sort of exercise is precisely the sort of thing ineqdeco was designed to do; I wrote the first version after publishing Jenkins (1995) and deciding there must be a better way than what I used for that paper!]

        Comment


        • #5
          Originally posted by Stephen Jenkins View Post
          As Breen & Andersen point out, overall inequality measured by the Theil index can written in terms of the subgroup inequalities (Theil indices), the subgroup population shares and the subgroup means. The subgroup population shares and means together define the subgroup income shares. [This is long well-known; it's how the decomposition theorists developed their results. See also Jenkins (1995) cited in the ineqdeco help file for an application to UK inequality trends.]

          ineqdeco provides you with all those elements, and saves them in r() -- as you know. To do Breen-Andersen type counterfactuals, you need to run ineqdeco once for each of the 2 years being compared. Ensure that the local macros you create from the stored results are named so that you can identify the relevant distributional element (mean, inequality, pop share, etc.), the group id, and the year id. With all these elements in memory, you should be able to do your counterfactuals. Match your elements with the corresponding ones in Breen-Andersen's equations 4 and 5a-5c

          [This sort of exercise is precisely the sort of thing ineqdeco was designed to do; I wrote the first version after publishing Jenkins (1995) and deciding there must be a better way than what I used for that paper!]
          Thanks again for the reply prof Jenkins, I appreciate your time and help. I've looked through the helpfile and various publications on the Theil-index, including yours, in the past two weeks and I seem to have mixed up some concepts in my original posts.

          However, even after going through everything my problem persists, when I use subgroup population share & inequalities(dices) from T-1(2000) but subgroup mean income from T-2(2010) I get a negative between group-inequality value.

          I tried to calculate the between-group value in different ways, even manually using the relative mean wage and population share from ineqdeco, but no matter what I try I get a the same negative between group inequality value(-0.047). As I understand it, weighing the log of the ratio of subgroup income shares & subgroup pop share by income shares guarantees that the value is positive, but could this change in some counterfactual setting? When the subgroup income share is calculated from mean income from one year and subgroup population share from another year, then maybe in some situations you could get a negative value? Or is it strictly impossible? Is there any other reason is could get a negative result?

          I've tried several different ways of arriving at the between-group value, as I'll write in he code below, but I still ge the same negative value. When I apply the model to 1 year I get the exact same Theil/within/between value as when I run ineqdeco. I can for the life of me not figure out why I get a negative between-group inequality value.

          I thank you again, I appreciate your time and apologize if i've made some other mixup, I'm not trained in economics or econometrics.

          Code:
          ineqdeco wage00, by(educ1)
          foreach x of numlist 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 {
          local v_`x' =  r(v_`x')
          }
          
          foreach x of numlist 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 {
          local GE_alpha1_`x' = r(ge1_`x')
          }
          
          ineqdeco wage10, by(educ2)
          foreach x of numlist 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 {
          local s_`x' =  r(lambda_`x')
          }
          
          foreach num of numlist 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 {
          local my_GE`num' = (`v_`num'' *  `s_`num'') * log(`s_`num'' * `v_`num'' /`v_`num'') 
          
          }
          
          local my_GEglobal =`my_GE1'+`my_GE2'+`my_GE3'+`my_GE4'+`my_GE5'+`my_GE6'+`my_GE7'+`my_GE8'+`my_GE9'+`my_GE10'+`my_GE11'+`my_GE12'+`my_GE13'+`my_GE14'+`my_GE15'
          
          display `my_GEglobal'
          
          
          // I calculate between group inequality above, below are equations for within & alternative equations for between.
          
          *Theil = `v_`num'' * `s_`num''* log(`s_`num'') + `v_`num'' * `s_`num'' * `GE_alpha1_`num''
          
          *Within = `v_`num'' * `s_`num'' * `GE_alpha1_`num''
          
          
          // Below are other formulas for calculating between group inequality, i get the same negative value for all of these.
           
           `v_`num'' * `s_`num''* log(`s_`num'') // Between 1
          //^above: s_=relative mean
          
          `v_`num'' * `s_`num''/`t_`num'' * log(`s_`num''/`t_`num'') // Between 2 
          // ^above: s_=subgroup mean & t_=overall mean
          
          (`v_`num'' *  (`s_`num''/`t_`num'')) * log(((`s_`num''/`t_`num'') * `v_`num'') /`v_`num'')  == Between 3
          // ^above: s_=subgroup mean & t_=overall mean

          Comment


          • #6
            Sorry, but I really don't want to look at your code details. (I don't have time.) But you appear to have not grasped the essential elements of the counter-factual exercise. It's as follows:


            For each year (time subscript suppressed),

            T = Sum(g=1,..,G) { s_g * T_g } + Sum(g=1,..,G) { s_g * log(mean_g/mean) }
            I.e. Within-group inequality + Between-group inequality

            where s_g is subgroup income share, mean_g is subgroup mean, and T_g is subgroup Theil
            But note that s_g = v_g * mean_g / mean where v_g is subgroup population share

            Hence T = Sum(g=1,..,G) { ( v_g * mean_g / mean)*T_g }
            + Sum(g=1,..,G) { ( v_g * mean_g / mean)* log(mean_g/mean) }

            But note also that mean = Sum(g=1,..,G) { v_g * mean_g }; so substitute this in the last eqn for T. Doing this means that ...

            ... Now we have T = f( v_1, ..., v_G, mean_1, ..., mean_G, T_1, ..., T_K).

            In words: we can write the Theil index for a given year as a function of the subgroup population shares, subgroup means, and subgroup inequalities. Each of these elements is saved in r() after running -ineqdeco-

            Counterfactual exercises to look at factors accounting for inequality change between year 0 and year 1 could include
            (a) calculate T assuming subgroup means and subgroup population shares are at t=0 values but subgroup inequalities are at t = 1 values
            (b) calculate T assuming subgroup inequalities and subgroup population shares are at t=0 values but subgroup means are at t = 1 values
            (c) calculate T assuming subgroup means and subgroup inequalities are at t=0 values but subgroup population shares are at t = 1 values

            NB one could also do the reverse (start with t=1 and calculate counterfactual t=0 values)

            To implement the counterfactual exercise, you need code that (i) first runs -ineqdeco, by()- for each of the 2 years; then (ii) calculates counterfactual Theil by reassembling the elements using the formula, substituting in counterfactual values as appropriate

            Other remarks.

            (a) you can change

            Code:
            foreach x of numlist 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 {
            to
            Code:
            forvalues g = 1/15 {
            (b) It's much less complicated to do this sort of counterfactual exercise using MLD = ge(0) rather than Theil = ge(1)

            (c) Jenkins (1995) did counterfactual exercises (a)-(c) when looking at UK inequality trends, but also used the decomposition formula of Mookerjee & Shorrocks (see -ineqdeco- References for full citations)

            (d) no guarantees that the formulae above are 100% correct. (I'm doing from first principles in a bit of a rush.)

            Comment


            • #7
              Dear prof Jenkins,
              Thanks again for all your comments and tips. I believe I have finally figured out how the index works. unfortunately i have not how it is supposed to work in Stata but, using indices/popshare & mean wage from ineqdeco, I have been able to perform the counterfactual in excel instead. A bit more cumbersome but not very much so.

              Thanks fore your help again and best regards.

              Comment


              • #8
                Dear prof Jenkins,
                Thanks again for all your comments and tips. I believe I have finally figured out how the index works. unfortunately I have not understood how it is supposed to work in Stata but, using indices/popshare & mean wage from ineqdeco, I have been able to perform the counterfactual in excel instead. A bit more cumbersome but not very much so.

                Thanks fore your help again and best regards.

                Comment


                • #9
                  Well, very good that you've made progress; but unfortunate that you've had to resort to Excel. I wrote the programs in order that I avoid doing that!

                  To spell things out further, as I said, all the various components you need are stored in r() macros after running -ineqdeco-. Pretty much all you need to do (!) is run -ineqdeco- for "year 1" and then save those components into other local macros (so that they won't be overwritten when you run -ineqdeco- again). E.g. r(ge1_1) contains the Theil index for the 1st group, and you could hold that as
                  Code:
                  local ge1_1_year1 = r(ge1_1)
                  and so on for all the other components. Then repeat for year 2, including (for example)
                  Code:
                  local ge1_1_year2 = r(ge1_2)
                  and so on for all the other components.

                  You'll now have all the components in local macros, and can use them to calculate whatever counterfactuals you want.

                  Comment


                  • #10
                    if you are still interested in computing theil index in stata, here is a sample code...

                    Code:
                    sysuse nlsw88.dta,clear
                    * Wage inequality by race
                    ineqdeco wage, bygroup(race)
                    display r(ge1) // Theil index
                    forvalues i=1/3 {
                    local v_`i' = r(v_`i') // subgroup population share
                    local s_`i' = r(lambda_`i') //relative mean
                    display `v_`i''
                    display `s_`i''
                    local btw_`i' = (`v_`i''*`s_`i'') * log(`s_`i'') //between components for each subgroup
                    display `btw_`i''
                    local wth_`i' = (`v_`i''*`s_`i'') * r(ge1_`i') // within components for each subgroup
                    }
                    local btw = `btw_1'+`btw_2'+`btw_3' // between inequality
                    local wth = `wth_1'+`wth_2'+`wth_3' // within inequality
                    display `btw'
                    display `wth'
                    display `btw' + `wth'

                    Comment


                    • #11
                      also if you want to estimate the counterfactual values you need to weight the sum of group mean by the proportion of subgroups in the reference period.
                      Below is a hypothetical example where we keep the distribution of racial groups constant over time with values in 2010 to estimate the counterfactual theil index in 2010.

                      Code:
                      ineqdeco wage if year == 2000, bygroup(race)
                      forvalues i=1/4 {
                      local v_`i'00 =  r(v_`i')
                      }
                      
                      ineqdeco wage if year == 2010, bygroup(race)
                      forvalues i=1/3 {
                      local v_`i' =  r(v_`i') // subgroup population share
                      local m_`i' = r(mean_`i') //subgroup mean
                      local pm_`i'= `m_`i'' * `v_`i'00' //sum of subgroup mean weighted by the proportion in the reference year
                      }
                      
                      local pm = `pm_1'+`pm_2'+`pm_3'
                      
                      forvalues i=1/3 {
                      local s_`i' = `m_`i''/`pm'
                      display `s_`i''
                      local btw_`i' = (`v_`i'00'*`s_`i'') * log(`s_`i'') //between components for each subgroup
                      display `btw_`i''  
                      local wth_`i' = (`v_`i'00'*`s_`i'') * r(ge1_`i') // within components for each subgroup
                      }
                      local btw_cf = `btw_1'+`btw_2'+`btw_3' // between inequality, counterfactual
                      local wth_cf = `wth_1'+`wth_2'+`wth_3' // within inequality, counterfactual

                      Comment

                      Working...
                      X