Attempting to count continuous clusters of dummy variables

Faraz Usmani

Join Date: Apr 2014

Posts: 2
#1

Attempting to count continuous clusters of dummy variables

23 Apr 2014, 11:34

I've panel data on annual credit-to-GDP (c_gdp) ratio for 48 countries (countrycode) between 1980-2010 (year). I've generated a dummy variable (boom) that is '1' if the change in the credit-to-GDP ratio in the previous four years (including the current year) is positive - a 'credit boom'.

Given the nature of the data, this results in separated clusters of such booms where, for example, boom = 1 when year = 1987, 1988, 1989, 1990, and 1991 (5 consecutive years). The 'boom' variable is then a missing value until boom = 1 again when year = 2001 ... 2009 (9 consecutive years). I would like to count the number of 1's in these separate clusters (i.e., 5 for the first cluster, 9 for the second cluster, and so on) and am unable to figure out how to count separate clusters for each country within the same variable.

For your reference, here is an example using grunfeld, where I would like to count the separate clusters of 1's in the 'boom' variable for each company:

webuse grunfeld, clear
drop invest kstock time
tsset company year
gen D_mvalue = D.mvalue
gen boom = 1 if D_mvalue > 0 & !mi(D_mvalue)

Thank you for your consideration.
Tags: categorical, continuous, count, panel
Dave Airey

Join Date: Apr 2014

Posts: 398
#2

23 Apr 2014, 11:45

How about changing the 1s and missings in the boom variable to characters and spaces, concatenating the whole column to a string, and then counting words in the resulting string?

You may also find a solution in the SSC SQ package, but that's just a guess.
Comment

Jeph Herrin

Join Date: Apr 2014
Posts: 335

23 Apr 2014, 12:16

Using your example:

Code:

gen boomyears=boom
bys company (year) : replace boomyears=boomyears[_n-1]+ 1 if !mi(boomyears[_n-1]) & !mi(boomyears)
gen negyear=-year
bys company (negyear) : replace boomyears=boomyears[_n-1] if !mi(boomyears[_n-1]) & !mi(boom)
drop negyear

hth,
Jeph

Comment

Faraz Usmani

Join Date: Apr 2014

Posts: 2
#4

23 Apr 2014, 12:34

Thank you very much for your help, Jeph!
Comment
Dave Airey

Join Date: Apr 2014

Posts: 398
#5

23 Apr 2014, 12:35

Very nice use of "bysort" and "_n"! I was recently working with Python and DNA strings and got mentally fixated I guess. That's a good reminder for me about _n and _N and by.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#6

23 Apr 2014, 13:11

Alternatively, see tsspell (SSC) for continuous clusters treated as spells.

Code:

tsspell, c(boom==1)

generates basic spell variables here.

That's the practice. The theory is in

SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
Q2/07 SJ 7(2):249--265 (no commands)
shows how to handle spells with complete control over
spell specification

http://www.stata-journal.com/sjpdf.h...iclenum=dm0029

Last edited by Nick Cox; 23 Apr 2014, 13:13.
2 likes
Comment
Dave Airey

Join Date: Apr 2014

Posts: 398
#7

23 Apr 2014, 13:35

Good to have that PDF, thanks. What's the difference between Speaking Stata and Stata Tips, by the way? Is it one of length?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35697
#8

23 Apr 2014, 16:13

Stata Tips are written by many people and are short.

Speaking Stata is written by me (with occasional co-authors).
Comment

Announcement

Attempting to count continuous clusters of dummy variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment