Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why do ginidesc command yield different Gini coefficients when combined with the "if" qualifier?

    Dear Statlist users,

    I am using Stata 14.1.

    I am trying to estimate wage inequality (by Gini coefficient) by different fields of graduation between 2009 and 2017.

    The following two commands yield the same Gini coefficients where group_yearfield is a grouping variable that combines year and field.

    ineqdeco wage_real , by(group_yearfield)
    Subgroup indices: GE_k(a) and Gini_k
    group(year field_isced_agg) GE(-1) GE(0) GE(1) GE(2) Gini
    2009 Education sciences 0.02628 0.02516 0.02463 0.02462 0.12507
    2009 Arts, Humanities & Languages 0.23779 0.13729 0.09504 0.07554 0.20667
    2009 Social sciences and journalism 0.32876 0.20835 0.16636 0.15805 0.31587
    2009 Business Administration & Law 0.23097 0.16740 0.14945 0.15863 0.29832
    2009 Environment sc, math. & stat. 0.10441 0.08880 0.08085 0.07810 0.22543
    2009 ICT 0.00000 0.00000 0.00000 0.00000 0.00000
    2009 Engineering, Manufacturing & Arc 0.18252 0.14354 0.12618 0.12286 0.27576
    2009 Agriculture & veterinary 0.12320 0.11102 0.10803 0.11267 0.25732
    2009 Health & Welfare 0.10245 0.09145 0.08861 0.09267 0.23524
    2009 Services 0.20181 0.15249 0.12777 0.11659 0.26928
    [PHP][PHP]


    ginidesc wage_real , by(group_yearfield)

    K Gini_k
    2009 Education sciences 0.125
    2009 Arts, Humanities & Languages 0.207
    2009 Social sciences and journalism 0.316
    2009 Business Administration & Law 0.298
    2009 Environment sc, math. & stat. 0.225
    2009 ICT 0.000
    2009 Engineering, Manufacturing & Arc 0.276
    2009 Agriculture & veterinary 0.257
    2009 Health & Welfare 0.235
    2009 Services 0.269
    Then I want to decompose wage inequality separately for each year and I type the commands

    ineqdeco wage_real if year==2009 , by(field)
    ginidesc wage_real if year==2009 , by(field)

    Here, the Gini coefficients by field with ineqdeco are the same with the above results

    ineqdeco wage_real if year==2009 , by(field)
    Education sciences 0.02628 0.02516 0.02463 0.02462 0.12507
    Arts, Humanities & Languages 0.23779 0.13729 0.09504 0.07554 0.20667
    Social sciences and journalism 0.32876 0.20835 0.16636 0.15805 0.31587
    Business Administration & Law 0.23097 0.16740 0.14945 0.15863 0.29832
    Environment sc, math. & stat. 0.10441 0.08880 0.08085 0.07810 0.22543
    ICT 0.00000 0.00000 0.00000 0.00000 0.00000
    Engineering, Manufacturing & Architectur 0.18252 0.14354 0.12618 0.12286 0.27576
    Agriculture & veterinary 0.12320 0.11102 0.10803 0.11267 0.25732
    Health & Welfare 0.10245 0.09145 0.08861 0.09267 0.23524
    Services 0.20181 0.15249 0.12777 0.11659 0.26928

    However, the Gini coefficients with ginidesc are different with the Gini coefficients I got above for 2009.

    ginidesc wage_real if year==2009 , by(field)
    Education sciences 0.235
    Arts, Humanities & Languages 0.298
    Social sciences and journalism 0.125
    Business Administration & Law 0.269
    Environment sc, math. & stat. 0.225
    ICT 0.000
    Engineering, Manufacturing & Architectur 0.276
    Agriculture & veterinary 0.257
    Health & Welfare 0.316
    Services 0.207
    What happens to the ginidec command when I use the if qualifier?

    Many thanks in advance.
    Elif.

  • #2
    Anyone who dealt with a similar problem, and advice me something?

    Comment


    • #3
      Part of finishing my programs is writing a certification script (help cscript). I am not as diligent as the people from StataCorp are, but I find that even a minimal certification script that just checks if and in conditions and fweights can find many problems before I send it out to the public. To check for an if condition you would compare the results:

      Code:
      sysuse nlsw88, clear
      ginidesc wage if collgrad == 0, by(race)
      
      keep if collgrad == 0
      ginidesc wage, by(race)
      If ginidesc left its results behind in r() then I would have used savedresults to compare the two automatically. Anyhow, the results are different, so there must be a bug in ginidesc. This probably means that somewhere the author forgot an if `touse', which is a very easy mistake to make.

      ---------------------------------
      Maarten L. Buis
      University of Konstanz
      Department of history and sociology
      box 40
      78457 Konstanz
      Germany
      http://www.maartenbuis.nl
      ---------------------------------

      Comment


      • #4
        Dear Maarten, what a wonderful example! Many thanks! And yes, the results by the subgroups are different. And I encountered something else with this simple example: The numbers in the subgroups are the same but they refer to different groups when I use if condition as below!

        ginidesc wage if collgrad == 0, by(race)
        K Gini_k
        white 0.248
        black 0.289
        other 0.324
        keep if collgrad == 0 ginidesc wage, by(race)
        K Gini_k
        white 0.324
        black 0.289
        other 0.248
        So, it is safer to keep first and make the decompositions rather than using the if condition.

        Many thanks!

        Comment


        • #5
          I wrote my answer from the perspective of someone who writes user-written programs, not from the perspective of a user. From the perspective of a user my conclusion would be that there is something wrong with that program, and I would stay away from it till that has been fixed. The best way forward is to contact the author of that program.
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            This code dates back to 1999 and has not been updated since. Although it's not always true, my guess would be that it has been abandoned. That's a term of fact, not of disapproval.

            Comment


            • #7
              Nick, do you think this is an out-of-date command to use?

              Comment


              • #8
                Programs don't go bad: if it was good 10 years ago, it is still good now. However, it may be that better programs are written, or better algorithms are developed, i.e. good programs don't get bad, but they may get superseded. Additionally, when a bug is found, it needs to be fixed, i.e. somebody needs to maintain a program. Nick's suspicion (and mine) is that it is no longer maintained. In this case that is a problem, since there appears to be a bug in ginidesc that is not being fixed. However, ineqdeco seems to do what you want, is being maintained, and does not appear to suffer from this bug, so using ineqdeco instead should solve your problem.
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #9
                  Maarten, many thanks. The thing with ineqdeco is that it does not decompose the within and overall inequality as ginidesc does. Maybe I should manually go for a decomposition.

                  Comment


                  • #10
                    The bigger question then is why you wish to undertake an inequality decomposition by population subgroups using the Gini index. It is well-known that it only for the Generalised Entropy class of indices that one can write total inequality as Within-Group Inequality (a subgroup-weighted sum of the inequalities within each subgroup) plus Between-Group Inequality (the inequality arising were each unit attributed with the mean income of his/her subgroup). That's why ineqdeco and ineqdec0 calculate the overall Gini, but do not fully decompose by population subgroup. With the Gini coefficient, total inequality = Within-Group inequality + Between-Group inequality + "overlap" term, where the overlap term disappears only if the subgroup income distributions do not overlap along the income range. [Aronson and Lambert, Economic Journal, argued that this Gini decomposition had a useful interpretation -- but the context they referred to is rather specific, in my view.]

                    Back to Maarten's perceptive remarks: note that ginidesc is written in a rather non-standard way with repeated preserve and restore calls. I suspect the problem you cited arises somewhere with these.

                    [PS please note FAQ's remarks about the use of real names]

                    Comment


                    • #11
                      Stephen, many thanks for your questioning aboout decomposition. I need to think more about that. What would be your suggestion about decomposing wage inquality?

                      And many thanks about remarking me about real names. It is weird but I checked my profile settings and could not find the way to change my name.

                      Comment


                      • #12
                        point #6 in the FAQ explains how to get your name changed

                        Comment


                        • #13
                          Rich, thank you so much! I contacted.

                          Comment


                          • #14
                            I've looked into Elif's problem in some more depth and I think, as he noticed, that ginidesc is calculating the subgroup GIni indices correctly but may not display them correctly -- the subgroup labels are incorrect. However, I have not been able to trace what the problem is within the ginidesc code. Rather than spend more time on that, I've written a new program called ineqdecgini, based on my ineqdec0 code, which has the same functionality as ginidesc, i.e. provides the decomposition of the Gini into within-group, between-group, and residual (overlap) terms. By contrast with ginidesc, results are saved in r() and output displayed in a more conventional manner. (Unlike ginidesc, I allow the 'income' variable to have zero or negative values, but these obs can of course be excluded using an if qualifier. Cf. the differences between my ineqdec0 and ineqdeco programs.) The output below shows that Elif's problem does not arise if one uses my program. I'll make ineqdecgini available on SSC when I can. If anyone has any comments or suggestions in the meantime about what the program does (or doesn't) do, based on what you see below, please speak up.

                            Code:
                            . sysuse nlsw88, clear
                            (NLSW, 1988 extract)
                            
                            . ineqdecgini wage if collgrad == 0, by(race)
                              
                            Gini for wage
                            
                            ----------------------
                              All obs |       Gini
                            ----------+-----------
                                      |    0.31841
                            ----------------------
                              
                            Subgroup summary statistics, for each subgroup k = 1,...,K:
                              
                            
                            -------------------------------------------------------------------------------------
                                 race |   Popn. share           Mean  Relative mean   Income share           Gini
                            ----------+--------------------------------------------------------------------------
                                white |       0.71004        7.31825        1.05900        0.75192        0.32444
                                black |       0.28005        5.87592        0.85028        0.23812        0.28939
                                other |       0.00992        6.93819        1.00400        0.00996        0.24844
                            -------------------------------------------------------------------------------------
                              
                            Decomposition: Gini = Gini_within + Gini_between + Residual
                            
                            ------------------------------------------------------------------
                              All obs |         Gini   Gini_within  Gini_between      Residual
                            ----------+-------------------------------------------------------
                                      |      0.31841       0.19254       0.04232       0.08356
                            ------------------------------------------------------------------
                              
                            Decomposition (% of total): Gini = Gini_within + Gini_between + Residual
                            
                            ------------------------------------------------------------------
                              All obs |  Gini_within   Gini_within  Gini_between      Residual
                            ----------+-------------------------------------------------------
                                      |    100.00000      60.46872      13.28961      26.24167
                            ------------------------------------------------------------------
                            Note: Gini_within = weighted sum across groups of subgroup Ginis, 
                             with each subgroup's weight equal to the product of its income share and population share.
                             Gini_between = Gini calculated attributing each obs with the mean of the obs's subgroup.
                            
                            . keep if collgrad == 0
                            (532 observations deleted)
                            
                            . ineqdecgini wage, by(race)
                              
                            Gini for wage
                            
                            ----------------------
                              All obs |       Gini
                            ----------+-----------
                                      |    0.31841
                            ----------------------
                              
                            Subgroup summary statistics, for each subgroup k = 1,...,K:
                              
                            
                            -------------------------------------------------------------------------------------
                                 race |   Popn. share           Mean  Relative mean   Income share           Gini
                            ----------+--------------------------------------------------------------------------
                                white |       0.71004        7.31825        1.05900        0.75192        0.32444
                                black |       0.28005        5.87592        0.85028        0.23812        0.28939
                                other |       0.00992        6.93819        1.00400        0.00996        0.24844
                            -------------------------------------------------------------------------------------
                              
                            Decomposition: Gini = Gini_within + Gini_between + Residual
                            
                            ------------------------------------------------------------------
                              All obs |         Gini   Gini_within  Gini_between      Residual
                            ----------+-------------------------------------------------------
                                      |      0.31841       0.19254       0.04232       0.08356
                            ------------------------------------------------------------------
                              
                            Decomposition (% of total): Gini = Gini_within + Gini_between + Residual
                            
                            ------------------------------------------------------------------
                              All obs |  Gini_within   Gini_within  Gini_between      Residual
                            ----------+-------------------------------------------------------
                                      |    100.00000      60.46872      13.28961      26.24167
                            ------------------------------------------------------------------
                            Note: Gini_within = weighted sum across groups of subgroup Ginis, 
                             with each subgroup's weight equal to the product of its income share and population share.
                             Gini_between = Gini calculated attributing each obs with the mean of the obs's subgroup.

                            Comment


                            • #15
                              ineqdecgini is now available from SSC

                              Comment

                              Working...
                              X