Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Decompose Theil index for data including zero values (no income/earnings)?

    Hello, I want to decompose the overall inequality measured by Theil index into between- and within-group inequality. The problem is my data contain persons with zero earnings, and they are automatically excluded from decomposition analysis if using Jenkin's ineqdec0. Is there any other user-written command that can provide group decomposition analysis of Theil index for data including zero values?

    I am aware that there is a debate about whether Theil's index can be applied to data with zero values, for ln(0) would be undefined. However, I am on the other side of the debate and believe that Theil's index can be applied to data with zero values. Those zero cases would be treated as missing and do not contribute to most of the calculation, BUT they still contribute to the group size.

    I do not know Stata programming, so I cannot just refine Jenkin's ineqdec0 ado file by myself.

    Thank you!

  • #2
    You can consider replacing zeros with a small positive value, for example with 1 or 0.01, etc.

    I hope this helps,
    Michal

    Comment


    • #3
      Shem Shen wrote:
      I am on the other side of the debate and believe that Theil's index can be applied to data with zero values. Those zero cases would be treated as missing and do not contribute to most of the calculation, BUT they still contribute to the group size.
      This statement needs elaboration. I am not aware that there is "another side of the debate". You cannot take logs of zero, full stop ("period" if you're North American).

      The Theil index, like all generalised entropy inequality indices, is additively decomposable by subgroup. So, you might think that you could say Total inequality = inequality(those with +ve earnings) + inequality(those with earnings = 0) + inequality between these 2 groups. But the between--group inequality term is calculated by attributing each person with the mean for their group (fine) ... except that to evaluate the term for the Theil index, one has to then take logs of those means. For the "earnings = 0" group, you're back to trying to take logarithm of zero.

      Michal's proposal will allow calculation of Theil indices (using e.g. ineqdeco on SSC), but you need to think seriously about whether this is sensible in your context. The log of a small fraction is a negative number large in magnitude. The log of one is zero. If you include these values in with the rest of the distribution, it is like adding "dirt" to the data and can easily lead to non-robust results in both the metaphorical and technical sense: see e.g. Cowell, F. A. and Victoria-Feser, M.-P. (1996). Robustness properties of inequality measures. Econometrica, 64 (1), 77–101.

      On the other hand, if the zero values are potentially genuine zeros in your context, then you might need to move away from standard "relative" measures of inequality like the Theil and Gini indices. (This is similar to the consideration of the inequality of "net worth", for which zero and negative values are possible and may be prevalent.) On this, see e.g. Jäntti and Jenkins, ‘Methods for summarizing and comparing wealth distributions’, ISER Working Paper 2005-05, https://www.iser.essex.ac.uk/publica...s/iser/2005-05.

      Thinking further: if you're talking about labour market earnings specifically, then "zero earnings" typically means "not in paid work", and it would be very unusual to include those not in paid work with those in paid work when assessing earnings inequality. (As it happens, there is a small literature that does do this, but it imputes earnings to the non-paid-workers, e.g. using income from unemployment insurance or other social security benefits. Genuine zeros for this more broadly-defined "earnings +" variable would be unusual. )

      If you really do wish to persist with including individuals with zero earnings when summarising earnings dispersion, then the easiest approach would be to use percentile ratio measures, e.g. the ratio of the 90th percentile to the 10th percentile.

      Comment


      • #4
        thank you very much Professor Jenkins!

        Comment


        • #5
          Originally posted by Stephen Jenkins View Post
          Shem Shen wrote:

          This statement needs elaboration. I am not aware that there is "another side of the debate". You cannot take logs of zero, full stop ("period" if you're North American).
          Dear Prof. Jenkins,

          About whether we can include zero in Theil computation, I find a recent article by Tim Liao that might be of interest to you

          http://srd.sagepub.com/content/suppl...tensions16.pdf (go directly to Page 3)

          Liao, T. F. (2016). Evaluating Distributional Differences in Income Inequality. Socius: Sociological Research for a Dynamic World, 2, 1-14.

          Comment


          • #6
            Originally posted by Michal Brzezinski View Post
            You can consider replacing zeros with a small positive value, for example with 1 or 0.01, etc.

            I hope this helps,
            Michal
            Thank you Michal. I hope the article above will also be helpful to you

            Comment


            • #7
              Shem: thanks for the reference to Tim Liao's paper. I discussed it with him extensively when it was in draft form.

              I am against Michal's proposal in general (though might be in specific cases). Such transformations are artificial and can have exert unduely high leverage on some indices. See my previous post for how I would proceed

              Comment


              • #8
                Originally posted by Stephen Jenkins View Post
                Shem: thanks for the reference to Tim Liao's paper. I discussed it with him extensively when it was in draft form.

                I am against Michal's proposal in general (though might be in specific cases). Such transformations are artificial and can have exert unduely high leverage on some indices. See my previous post for how I would proceed
                Thanks! However, may I ask how do you think of Tim's claim that zero values can be included in the computation of Theil index? It seems in conflict with your previous comment that cases with zero incomes should never be included in calculating Theil index. Below is an excerpt from his follow-up comment of his 2016 socius paper.

                http://journals.sagepub.com/doi/supp...tensions16.pdf

                "Inclusion of Zero Values in Theil’s Index A prevalent misconception exists in the application of Theil’s first measure and its decomposition (i.e., equations [1], [2], and [3]). That is, zero values cannot be included or analyzed because of the natural logarithm function in the formula. This misconception has been propagated by various publications and by several computer macros including those popular ones for both the R and the Stata platforms. The misconception is rather unfortunate especially since an inequality statistic is supposed to be able to measure the situation of extreme inequality, such as the case where one person has all the income while the others have none. In such a situation, the Gini Coefficient equals to 1, and Theil’s first measure equals to ln(N), the maximum value of Theil’s first measure given by Theil (1967). When sampling weights are present, the upper limit is       ∑= N i wi 1 ln . For example, for a sample of 10 cases where one of them has all the income while the other have none, or xi=[0,0,0,0,0,0,0,0,0,1], the Theil first measure should produce log(10)=2.30. The inclusion of zero values is also consistent with Theil’s (1967, p. 91) conceptualization, when he defined income share to be yi≥0. However, when the nine zero values are excluded, ∑= = = 1 1 0 1 1 ln 1 1 i w TT even though the correct result should be 2.30. The reason for the incorrect result lies in the incorrect application of information theory. Entropy or ∑= = = N i i i x H x 1 0 1 ln if all the xi but one are zero, with this one having the value unity (Shannon 1948, p. 394). Theil’s first measure is defined by ln(N)−H, or the maximum possible entropy minus the observed entropy (Theil 1967). Conceptually, entropy can be understood as random distribution (or equality) while order, inequality. Scholars of information theory in later years clarified the calculation involving log(0) even more clearly. According to information theory, 0log(0)=0 by definition (Cover and Thomas 2006, p. 19). Therefore, the simple dataset with all the zero values included will produce the result of ln(10) 2.30 10 1 1 ln 1 10 10 1 0 0 0 0 0 0 0 0 0 10 1 ln 10 10 1 1 10 1 = =         = + + + + + + + + + = ∑i= w i i T x x T if we apply this information theory principle"

                Comment


                • #9
                  may I ask how do you think of Tim's claim that zero values can be included in the computation of Theil index?
                  The standard argument is that any inequality index which involves taking the log of income can only be calculated for positive values. This includes the Theil (GE(1)) and Mean Log Deviation (GE(0)) indices.

                  This is not quite right in the sense that one can calculate a Theil index when there are zero values -- using a modified version of the standard formula. See, for example, Proposition 1 of Morrisson, C. and Murtin, F. (2013). The Kuznets curve of human capital inequality: 1870–2010. Journal of Economic Inequality, 11(3), 283–301.

                  The authors use well-known results about the additive decomposability of generalised entropy inequality indices by population subgroups; the 'trick' is to define all those with zeros as a separate subgroup (and then use limit theorem arguments). The same trick does not work for the MLD. It's long been known that the trick works for the Gini coefficient.because this is one case in which there are non-overlapping subgroups, and so the Gini is additively decomposable.


                  Comment


                  • #10
                    Originally posted by Stephen Jenkins View Post

                    The standard argument is that any inequality index which involves taking the log of income can only be calculated for positive values. This includes the Theil (GE(1)) and Mean Log Deviation (GE(0)) indices.

                    This is not quite right in the sense that one can calculate a Theil index when there are zero values -- using a modified version of the standard formula. See, for example, Proposition 1 of Morrisson, C. and Murtin, F. (2013). The Kuznets curve of human capital inequality: 1870–2010. Journal of Economic Inequality, 11(3), 283–301.

                    The authors use well-known results about the additive decomposability of generalised entropy inequality indices by population subgroups; the 'trick' is to define all those with zeros as a separate subgroup (and then use limit theorem arguments). The same trick does not work for the MLD. It's long been known that the trick works for the Gini coefficient.because this is one case in which there are non-overlapping subgroups, and so the Gini is additively decomposable.

                    Thank you Professor Jenkins! I will read Morrisson's paper.

                    Comment


                    • #11
                      Originally posted by Stephen Jenkins View Post

                      The standard argument is that any inequality index which involves taking the log of income can only be calculated for positive values. This includes the Theil (GE(1)) and Mean Log Deviation (GE(0)) indices.

                      This is not quite right in the sense that one can calculate a Theil index when there are zero values -- using a modified version of the standard formula. See, for example, Proposition 1 of Morrisson, C. and Murtin, F. (2013). The Kuznets curve of human capital inequality: 1870–2010. Journal of Economic Inequality, 11(3), 283–301.

                      The authors use well-known results about the additive decomposability of generalised entropy inequality indices by population subgroups; the 'trick' is to define all those with zeros as a separate subgroup (and then use limit theorem arguments). The same trick does not work for the MLD. It's long been known that the trick works for the Gini coefficient.because this is one case in which there are non-overlapping subgroups, and so the Gini is additively decomposable.

                      Hi Professor Jenkins, sorry to bother you again. May I ask you one more question about Theil's T (second measure)? Say we have two populations A and B. During the same period, the Theil of people's incomes in A increased from 0.1 to 0.2, while the Theil of incomes in B increased from 0.2 to 0.3. The absolute amount of change in Theil is 0.1 in both cases, while the change measured in percentage is much higher for A than for B (0.1->0.2, 100% increase; 0.2->0.3, 50% increase only). Should we say that the change in the degree of inequality in both populations has been the same, because their Theils both increased by an absolute amount of 0.1?

                      It seems to me that many papers compare change in inequality across populations/periods using percentage change. But it is a bit confusing to me. For example, how can we say that a change in Theil from 0.01 to 0.02 is a much bigger change than a change from 0.50 to 0.51? It seems to me that the degree of inequality, as measured by Theil index, changed little in both cases. But if we insist on the percentage change approach, then we have to conclude that 0.01->0.02 is a much bigger change in degree of inequality (or much faster rate of growth) than 0.50->0.51.

                      I am sorry that this question is irrelevant to Stata. I became very confused at this issue when using your command ineqdec0 and reading your papers on decomposing trends in income inequality in the UK so I thought you could answer my question. Thank you!

                      Comment


                      • #12
                        Interesting question, but I don't think there is a single right answer. The issue is really: how do we assess whether a change in inequality (however measured) is large or small or ...? (I'm referring to substantive change rather than statistically significant change here.) Absolute or percentage change? If the reader can see both base year and final year estimates, they can make up their own mind. You as a researcher can report all. Sometimes for numbers that are small (between 0 and 1) it makes more sense to

                        For the Gini coefficient -- the most commonly-used summary inequality measure in official statistics -- my mentor used to refer to a two percentage point change as a benchmark reference point for a large change on the grounds that such a large change was rare (in rich countries). One could benchmark changes in a Theil index in an analogous way.

                        Comment


                        • #13
                          Originally posted by Stephen Jenkins View Post
                          Interesting question, but I don't think there is a single right answer. The issue is really: how do we assess whether a change in inequality (however measured) is large or small or ...? (I'm referring to substantive change rather than statistically significant change here.) Absolute or percentage change? If the reader can see both base year and final year estimates, they can make up their own mind. You as a researcher can report all. Sometimes for numbers that are small (between 0 and 1) it makes more sense to

                          For the Gini coefficient -- the most commonly-used summary inequality measure in official statistics -- my mentor used to refer to a two percentage point change as a benchmark reference point for a large change on the grounds that such a large change was rare (in rich countries). One could benchmark changes in a Theil index in an analogous way.
                          Thank you so much for your quick response!!! May I ask what you were trying to say after "Sometimes for numbers that are small (between 0 and 1) it makes more sense to"? It seems the end of that sentence was deleted accidentally.

                          Comment


                          • #14
                            ... focus on absolute changes. (And the 2 percentage point benchmark cited in the next paragraph referred to a change over one year.)

                            Comment


                            • #15
                              Originally posted by Stephen Jenkins View Post
                              ... focus on absolute changes. (And the 2 percentage point benchmark cited in the next paragraph referred to a change over one year.)
                              Thank you!! I really appreciate your reply!

                              Comment

                              Working...
                              X