I am working with CPS data on Stata13. For each year from 1962 to 2014, I want to calculate the Gini of each age group between 20 and 80 (so 3720 different Ginis). The entire dataset has ~5m observations. I am looking for a way to accelerate this calculation:
I'm thinking for example that it could be accelerated by editing ineqdec0.ado to calculate the Gini only and not the other inequality measures, but I am not sure how to do this.
I used the code below in order to do a quick comparison of different commands in terms of speed and accuracy. ineqdec0 took about 90sec, fastgini took 50 and inequal7 about 110; the latter however is too imprecise for my purposes (inequal7's Gini differed from ineqdec0's Gini by up to 0.09 for some years).
Thanks, Joachim
Code:
qui foreach yyy in $years { ineqdec0 inctot_adj [w=wtsupp] if year==`yyy', by(cohort) foreach ccc in $cohorts { di r(gini_`ccc') replace apcgini=r(gini_`ccc') if cohort==`ccc' & year==`yyy' } }
I used the code below in order to do a quick comparison of different commands in terms of speed and accuracy. ineqdec0 took about 90sec, fastgini took 50 and inequal7 about 110; the latter however is too imprecise for my purposes (inequal7's Gini differed from ineqdec0's Gini by up to 0.09 for some years).
Code:
set more off qui gen g1=. qui gen g2=. qui gen g3=. qui timer on 1 qui ineqdec0 inctot_adj [w=wtsupp], by(year) qui foreach yyy in $years { di r(gini_`yyy') replace g1=r(gini_`yyy') if year==`yyy' } qui timer off 1 qui timer on 2 qui foreach yyy in $years { fastgini inctot_adj [w=wtsupp] if year==`yyy' di r(gini) replace g2=r(gini) if year==`yyy' } qui timer off 2 qui timer on 3 qui foreach yyy in $years { inequal7 inctot_adj [w=wtsupp] if year==`yyy' di r(gini) replace g3=real(r(gini)) if year==`yyy' } qui timer off 3 qui gen gdiff2=g1-g2 qui gen gdiff3=g1-g3 table year, contents(mean g1 mean g2 mean g3) table year, contents(mean gdiff2 mean gdiff3) timer list
Thanks, Joachim
Comment