Cumulative sums in descending order within groups

Tom Kemeny

Join Date: Apr 2016

Posts: 52
#1

Cumulative sums in descending order within groups

19 Mar 2021, 07:40

Hello Statalisters,

I'm using Stata 16.1, and I have panel data in which the unit of observation is the region (called czone). I have a few measures of interest that, for a given indicator, represents each region’s share of the nation in a given year. Each of these ends in 'shr'. Here is a random sample of my data, generated thru dataex.

Code:

year czone normshr lesshr leoshr leshr 1960 1202 .0003638664 .00004861347 .00023602384 0 1990 10000 .000834548 .0003118703 .0005555756 .0002700112 1990 10102 .0006462617 .000217258 .0004091604 .0000675028 1970 17501 .0015207013 .0008979761 .001310687 .0004139073 2010 18400 .0006034978 .00019348213 .0004174155 .00013348313 2000 26201 .0007506197 .0003190925 .0006227999 .0003383522 2019 26203 .0006399278 .0001512621 .0004416599 .00013742084 1970 27603 .0005867054 .0002417628 .0006869808 0 1960 35002 .0003008295 .0000763926 .00029739 .00022237047 1950 36404 .0007740941 .000380904 .0007541478 0

For each year in the data, and within that for each 'shr' variable, what I want is to generate a variable that counts the number of regions it takes to reach a cumulative sum of 0.8. Ultimately, what I was thinking was that I want a summary dataset where the unit of observation is the year, that looks like:

Code:

year leshr80 lesshr80 leoshr80 normshr80

where the variables ending in 80 display the number of regions it took to get to an 80 percent cumulative sum on the underlying share variable. I don't care about the naming - just trying to make my goal clear.

I’m struggling to figure out how to do this. I see three issues - I am not sure how to solve any of them.

First, I need cumulative counts, from data sorted in descending order within each year. But ‘gsort’ does not play with ‘by’. I could chop the data up into years, but this seems clumsy.

Second, I tried dropping all but one year to see if I could make a go of it year by year, but I’m still stumped, this time the (seemingly simple) task of being able to generate a descending cumulative sum of the shares (highest to lowest). As an example, using the share indicator called leshr, I tried the following:

Code:

gsort -leshr g cumut=0 replace cumut=leshr[_n] + cumut[_n-1]

But this returned all missings. I think I see why, but I'm not sure what to do about it.

Third, if I could fix the above problems, I'm still not sure how to efficiently create a summary variable for each 'shr' variable that told me how many observations it took to get to the cumulative total of 0.8.

I hope I'm making sense. Thanks in advance.

Tom
Tags: None
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#2

19 Mar 2021, 08:05

The description of your data was too complicated and I lost the plot.

Your command generates all missings, because in the first period the period before is missing, so the first is missing, and then from there on everything is missing.

Try whether the following is doing what you want:

Code:

gsort -leshr gen cumut=sum(leshr)

Or alternatively:

Code:

gsort -leshr gen cumut=0 replace cumut=leshr[_n] + cumut[_n-1] if _n>1
Comment
Tom Kemeny

Join Date: Apr 2016

Posts: 52
#3

19 Mar 2021, 08:43

Thanks - this solves part 2 of my problem. I naively thought sum gave a grand not running total. I stand corrected.

But (part 1), I cannot do this within years - since I cannot do the first line in the following:

Code:

bys year: gsort -leshr bys year: gen cumut=sum(leshr)

And for part 3 of my problem: if I had a variable with yearly running sums of leshr, as above, how to summarise it such that the summary variable returns the count of observations it takes for the running sum variable to reach 0.8?

Thanks.
Comment

Kelvin Foo

Join Date: May 2017
Posts: 6

20 Mar 2021, 11:43

Using another data example such that the cumulative sums exceed 0.8, see if this code helps:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(year czone) double(normshr lesshr)
1990 1000  .1  .9
1990 2000  .2 .04
1990 3000  .2   0
1990 4000  .2 .01
1990 5000  .2 .03
1990 6000  .1 .02
1995 1000 .05 .05
1995 2000  .3 .08
1995 3000  .4  .8
1995 4000 .05   0
1995 5000  .1 .02
1995 6000   0   0
1995 7000  .1 .05
2000 3000 .25  .1
2000 4000 .25  .4
2000 5000 .25  .2
2000 6000 .25  .3
end

reshape long @shr, i(year czone) j(indicator) string

gsort year indicator -shr
by year indicator: g double cumut = sum(shr)
by year indicator: g shr80 = _n if cumut>=.8 & (cumut[_n-1]<.8 | _n==1)

keep if !missing(shr80)
keep year indicator shr80
reshape wide @shr80, i(year) j(indicator) string

list, noobs clean

Output:

Code:

  year   lesshr80   norms~80  
    1990          1          4  
    1995          1          4  
    2000          3          4

Last edited by Kelvin Foo; 20 Mar 2021, 12:10. Reason: Formatting, and misread original question..

Comment

Tom Kemeny

Join Date: Apr 2016

Posts: 52
#5

23 Mar 2021, 10:18

Thank you Kelvin - this is perfect!
Comment

Announcement

Cumulative sums in descending order within groups

Comment

Comment

Comment

Comment