Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Negative Ginis

    Dear all,

    I am computing inequality indexes using the ineqdec0 command as I include negative and zero values for my income variable. In this case, I know that the Gini index cannot be interpreted as before between 0 and 1, but its value can go well above 1. However, it should still be an area measurement, such that negative values are not allowed. Nevertheless, when applying such command I obtain negative values for my Gini coefficient.
    How is it possible?
    In case I am wrong, and negative values are reasonable, how can I report everything in positive terms?
    I attach my code, although I do not think it is useful as it is a one line command.

    ineqdec0 income [aw=weight] if country ==`x' & year ==`y'
    replace gini_country=r(gini) if country ==`x' & year ==`y'

    Best regards,
    Francesco


  • #2
    It is well-known that you can get a negative estimate for the Gini coefficient in the situation in which mean income is negative (because e.g. there are sufficiently many negative incomes). Is this the situation in your case?

    See inter alia, Amiel & Cowell, Inequality among the Kibbutzim, Economica, 63 (1996), S63-S85

    Comment


    • #3
      Originally posted by Stephen Jenkins View Post
      It is well-known that you can get a negative estimate for the Gini coefficient in the situation in which mean income is negative (because e.g. there are sufficiently many negative incomes). Is this the situation in your case?

      See inter alia, Amiel & Cowell, Inequality among the Kibbutzim, Economica, 63 (1996), S63-S85
      Many thanks for your reply.

      Actually, I believe it is not the case, negative incomes represent less than 1% of the sample. However, I'll check your reference for further insights.
      Thanks again

      Comment


      • #4
        Your question really isn't clear without more detail, or at a minimum it is too difficult to guess at a good answer from what you have shared. Please help us help you. Show example data.

        If I understand correctly, for at least one combination of country and year, the value of gini_country is the same negative value for all observations with that country and year. Suppose this is true when country is "FRA" and the year is 2020. (You will have to modify the sample code below to reflect the actual coding from country and year.)

        If this combination of country and year has at most 100 observations, please provide the data using the dataex command. Run
        Code:
        dataex income weight country year if country=="FRA" & year==2020
        and then copy the output from the Stata Results window, starting with [CODE] and ending with [/CODE] and paste that into your next post.

        On the other hand, if the combinations all have more than 100 observations, choose one and run
        Code:
        summarize income [aw=weight] if country=="FRA" & year==2020, detail
        and copy the command and output from the Results window and paste it into your next post, surrounding it with CODE delimiters [CODE] and [/CODE].

        Comment


        • #5
          Originally posted by William Lisowski View Post
          Your question really isn't clear without more detail, or at a minimum it is too difficult to guess at a good answer from what you have shared. Please help us help you. Show example data.

          If I understand correctly, for at least one combination of country and year, the value of gini_country is the same negative value for all observations with that country and year. Suppose this is true when country is "FRA" and the year is 2020. (You will have to modify the sample code below to reflect the actual coding from country and year.)

          If this combination of country and year has at most 100 observations, please provide the data using the dataex command. Run
          Code:
          dataex income weight country year if country=="FRA" & year==2020
          and then copy the output from the Stata Results window, starting with [CODE] and ending with [/CODE] and paste that into your next post.

          On the other hand, if the combinations all have more than 100 observations, choose one and run
          Code:
          summarize income [aw=weight] if country=="FRA" & year==2020, detail
          and copy the command and output from the Results window and paste it into your next post, surrounding it with CODE delimiters [CODE] and [/CODE].
          Thanks for your reply.
          Yes, my problem arises for several combinations of country and year. All combinations have more than 100 obs, so I used the second command as suggested. Many of the combinations behave similarly to the case I report below. I also noticed that, as suggested by Professor Jenkins, the distributions report a negative mean.

          Code:
          summarize income [aw= weight] if country==64 & year==2011, detail
          
                                   income
          -------------------------------------------------------------
                Percentiles      Smallest
           1%    -45911.53      -88132.44
           5%    -30338.26      -68206.58
          10%    -23878.67      -67864.48       Obs               1,959
          25%    -17888.38      -67056.28       Sum of Wgt.   3,198,385
          
          50%    -11359.77                      Mean          -11692.81
                                  Largest       Std. Dev.      11869.22
          75%    -5323.441       53911.83
          90%     1052.164       57327.38       Variance       1.41e+08
          95%     4493.156       62880.52       Skewness       .0457554
          99%     18880.22       68875.16       Kurtosis       9.045249
          Actually, I noticed that when applying top/bottom coding to the country distribution (e.g. replacing the obs above/below p99/p1 respectively with the threshold percentiles' value), the number of combinations reporting a negative gini decreases

          Comment


          • #6
            Stephen Jenkins is a world-class authority on income inequality so usually it is pointless to comment in a thread he is watching. But -- quickly --his comment wasn't really a suggestion. Of many formulations of this measure, one has a numerator that is always non-negative (in practice always positive) and a denominator that is the mean. Therefore if the mean is negative, so also is the measure. High school mathematics, and end of algebraic story.

            Although what to do instead is by comparison an open question. It is interesting to me that thousands of people in a country in a year collectively have negative mean income, but no doubt this makes more sense with a full context. Evidently it's far from being a story of outliers, as somewhere between 75 and 90% of values are negative.

            Comment


            • #7
              Closing the loop, the demonstration I requested in post #4 that you provided in post #5 strongly suggests some sort of problem with your data. When an analysis produces unexpected results, the usual first step is to see if the data input to the analysis has unexpected characteristics. (Thank you for taking the effort to present the results clearly using CODE delimiters - not everyone does.)

              While you may have just 1% negative values in your data overall, which might be reasonable, it appears the negative values are in fact primarily in a small number of country/year combinations. I cannot imagine a reasonable country with over 75% of the population having a negative income in a given year using the common metrics for income.

              As Nick suggests, this seems to be a systematic data problem, not a matter of a few outliers that need to be trimmed.

              Comment


              • #8
                Ok, I will have a deeper look into the data and the sources.
                Thank you all for your comments!

                Comment

                Working...
                X