Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Quickest way to calculate many Ginis

    I am working with CPS data on Stata13. For each year from 1962 to 2014, I want to calculate the Gini of each age group between 20 and 80 (so 3720 different Ginis). The entire dataset has ~5m observations. I am looking for a way to accelerate this calculation:

    Code:
    qui foreach yyy in $years {
        ineqdec0 inctot_adj [w=wtsupp] if year==`yyy', by(cohort)
        foreach ccc in $cohorts {
              di r(gini_`ccc')
              replace apcgini=r(gini_`ccc') if cohort==`ccc' & year==`yyy'
    }
    }
    I'm thinking for example that it could be accelerated by editing ineqdec0.ado to calculate the Gini only and not the other inequality measures, but I am not sure how to do this.

    I used the code below in order to do a quick comparison of different commands in terms of speed and accuracy. ineqdec0 took about 90sec, fastgini took 50 and inequal7 about 110; the latter however is too imprecise for my purposes (inequal7's Gini differed from ineqdec0's Gini by up to 0.09 for some years).

    Code:
    set more off
    qui gen g1=.
    qui gen g2=.
    qui gen g3=.
    
    qui timer on 1
    qui ineqdec0 inctot_adj [w=wtsupp], by(year)
    qui foreach yyy in $years {
        di r(gini_`yyy')
        replace g1=r(gini_`yyy') if year==`yyy'
    }
    qui timer off 1
    
    qui timer on 2
    qui foreach yyy in $years {
        fastgini inctot_adj [w=wtsupp] if year==`yyy'
        di r(gini)
        replace g2=r(gini) if year==`yyy'
    }
    qui timer off 2
    
    qui timer on 3
    qui foreach yyy in $years {
        inequal7 inctot_adj [w=wtsupp] if year==`yyy'
        di r(gini)
        replace g3=real(r(gini)) if year==`yyy'
    }
    qui timer off 3
    
    qui gen gdiff2=g1-g2
    qui gen gdiff3=g1-g3
    table year, contents(mean g1 mean g2 mean g3)
    table year, contents(mean gdiff2 mean gdiff3)
    
    timer list

    Thanks, Joachim

  • #2
    If the Gini coefficient is the only statistic that you want to calculate, I would recommend Philippe Van Kerm's sgini module. Have a look at http://medim.ceps.lu/?id=software to get it, as well as a pdf manual. I'd be interested in knowing your timer comparisons in this case.

    BTW: (1) Were you using the most recent version of ineqdec0 (on SSC; version 2.0.2 May 2008)? (2) I conjecture that the ancient inequal7 gives different answers because of differences in the way ties are treated. (ineqdeco, ineqdec0, and sgini follow the same rules for treatment of tied values; I can't speak for fastgini )

    PS Joachim: welcome to Statalist. Please follow the FAQ recommendation to state the provenance of user-written software.

    Comment


    • #3
      Thanks for the head's up Stephen, I am indeed using ineqdec0 version 2.0.2 May 2008 from SSC. fastgini and inequal7 are also at SSC.

      I made a mistake in my previous post. It was fastgini not inequal7 which yielded imprecise results. However, when I ran the code below, I found that all the gini estimates where identical. The timer comparison read:
      ineqdec0: 85.87s
      fastgini: 48.37s
      inequal7: 98.6s
      sgini: 791.75s

      Code:
      qui gen g1=.
      qui gen g2=.
      qui gen g3=.
      qui gen g4=.
      
      qui timer on 1
      qui ineqdec0 inctot_adj [w=wtsupp], by(year)
      qui foreach yyy in $years {
          replace g1=r(gini_`yyy') if year==`yyy'
      }
      qui timer off 1
      
      qui timer on 2
      qui foreach yyy in $years {
          fastgini inctot_adj [w=wtsupp] if year==`yyy'
          replace g2=r(gini) if year==`yyy'
      }
      qui timer off 2
      
      qui timer on 3
      qui foreach yyy in $years {
          inequal7 inctot_adj [w=wtsupp] if year==`yyy'
          replace g3=real(r(gini)) if year==`yyy'
      }
      qui timer off 3
      
      qui timer on 4
      qui foreach yyy in $years {
          sgini inctot_adj [aw=wtsupp] if year==`yyy'
          replace g4=r(coeff) if year==`yyy'
      }
      qui timer off 4
      
      qui gen gdiff2=g1-g2
      qui gen gdiff3=g1-g3
      qui gen gdiff4=g1-g4
      table year, contents(mean g1 mean g2 mean g3 mean g4)
      table year, contents(mean gdiff2 mean gdiff3 mean gdiff4)
      
      timer list

      Comment


      • #4
        Do you really mean "sgini: 791.75s" or is that a typo? If it's a genuine timing, I admit great surprise! It's also puzzling that you now get identical results from all programs but didn't before. Were you using the same test data set in each case? (If they differed, did one contain more ties than the other? Was there an issue with weights, perhaps? Or differences in the prevalence of obs with zero values? Check that all programs treat these in the same way!)

        Comment


        • #5
          I redid the calculation and it is not a typo. I used the same data and dropped all zero values before doing the calculation, so that this would not factor in. The different commands use aweights, except fastgini, which uses pweights. It is my understanding that this does not make a difference to the calculation of the Gini point estimate, only to the calculation of errors. I cannot locate the source of the differing results I got the first time, though I must have made a mistake somewhere. I now consistently get only negligible discrepancies (order of e-08) for all commands, so it seems that fastgini is the best one for my purposes.

          Comment


          • #6
            fastgini command does not work for negative values, i am using it for NFHS 5 data for wealth scores(variable hv271).An error showed up : hv271 has 12026 values <= 0. Not used in calculations, how to correct/fix it?
            Last edited by Abhishek Kumar; 03 Mar 2024, 03:37.

            Comment


            • #7
              -ineqdec0- (SSC) allows values of zero or less than zero (as well as positive values)

              Comment


              • #8
                Thank you for the response but how shall I estimate the confidence interval for Gini coefficient using the ineqdec0 command, please help.

                Comment


                • #9
                  bootstrap?

                  Comment

                  Working...
                  X