I am working with CPS data on Stata13 (with OSX Yosemite 10.10.2, 2.3 GHz Intel Core i7, 16 GB 1600 MHz DDR3). For each year 1962-2014, I want to calculate the Gini of each age group between 20 and 80 (so 3720 different Ginis). The entire dataset has ~5m observations. I am looking for a way to accelerate this calculation. The following code takes XXXX minutes to run.
I used the code below in order to do a quick comparison of different commands in terms of speed and accuracy. ineqdec0 took about 90sec, fastgini took 50 and inequal7 about 110; the latter however is too imprecise for my purposes, the Gini differing from ineqdec0's Gini by up to 0.5. I'm thinking that the calculation could be accelerated by editing ineqdec0.ado to calculate the Gini only and not the other inequality measures, but I don't know how to do this.
Thanks, Joachim
Code:
foreach yyy in $years { qui ineqdec0 inctot_adj [w=wtsupp] if year==`yyy', by(cohort) foreach ccc in $cohorts { qui di r(gini_`ccc') qui replace apcgini=r(gini_`ccc') if cohort==`ccc' & year==`yyy' } }
Code:
set more off qui gen g1=. qui gen g2=. qui gen g3=. qui timer on 1 qui ineqdec0 inctot_adj [w=wtsupp], by(year) qui foreach yyy in $years { qui di r(gini_`yyy') qui replace g1=r(gini_`yyy') if year==`yyy' } qui timer off 1 qui timer on 2 qui foreach yyy in $years { qui fastgini inctot_adj [w=wtsupp] if year==`yyy' qui di r(gini) qui replace g2=r(gini) if year==`yyy' } qui timer off 2 qui timer on 3 qui foreach yyy in $years { qui inequal7 inctot_adj [w=wtsupp] if year==`yyy' qui di r(gini) qui replace g3=real(r(gini)) if year==`yyy' } qui timer off 3 qui gen gdiff2=g1-g2 qui gen gdiff3=g1-g3 table year, contents(mean g1 mean g2 mean g3) table year, contents(mean gdiff2 mean gdiff3) timer list timer clear
Thanks, Joachim