Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Attempting to count continuous clusters of dummy variables

    I've panel data on annual credit-to-GDP (c_gdp) ratio for 48 countries (countrycode) between 1980-2010 (year). I've generated a dummy variable (boom) that is '1' if the change in the credit-to-GDP ratio in the previous four years (including the current year) is positive - a 'credit boom'.

    Given the nature of the data, this results in separated clusters of such booms where, for example, boom = 1 when year = 1987, 1988, 1989, 1990, and 1991 (5 consecutive years). The 'boom' variable is then a missing value until boom = 1 again when year = 2001 ... 2009 (9 consecutive years). I would like to count the number of 1's in these separate clusters (i.e., 5 for the first cluster, 9 for the second cluster, and so on) and am unable to figure out how to count separate clusters for each country within the same variable.

    For your reference, here is an example using grunfeld, where I would like to count the separate clusters of 1's in the 'boom' variable for each company:

    webuse grunfeld, clear
    drop invest kstock time
    tsset company year
    gen D_mvalue = D.mvalue
    gen boom = 1 if D_mvalue > 0 & !mi(D_mvalue)

    Thank you for your consideration.

  • #2
    How about changing the 1s and missings in the boom variable to characters and spaces, concatenating the whole column to a string, and then counting words in the resulting string?

    You may also find a solution in the SSC SQ package, but that's just a guess.

    Comment


    • #3

      Using your example:

      Code:
      gen boomyears=boom
      bys company (year) : replace boomyears=boomyears[_n-1]+ 1 if !mi(boomyears[_n-1]) & !mi(boomyears)
      gen negyear=-year
      bys company (negyear) : replace boomyears=boomyears[_n-1] if !mi(boomyears[_n-1]) & !mi(boom)
      drop negyear
      hth,
      Jeph

      Comment


      • #4
        Thank you very much for your help, Jeph!

        Comment


        • #5
          Very nice use of "bysort" and "_n"! I was recently working with Python and DNA strings and got mentally fixated I guess. That's a good reminder for me about _n and _N and by.

          Comment


          • #6
            Alternatively, see tsspell (SSC) for continuous clusters treated as spells.

            Code:
            tsspell, c(boom==1)
            generates basic spell variables here.

            That's the practice. The theory is in

            SJ-7-2 dm0029 . . . . . . . . . . . . . . Speaking Stata: Identifying spells
            . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox
            Q2/07 SJ 7(2):249--265 (no commands)
            shows how to handle spells with complete control over
            spell specification

            http://www.stata-journal.com/sjpdf.h...iclenum=dm0029
            Last edited by Nick Cox; 23 Apr 2014, 14:13.

            Comment


            • #7
              Good to have that PDF, thanks. What's the difference between Speaking Stata and Stata Tips, by the way? Is it one of length?

              Comment


              • #8
                Stata Tips are written by many people and are short.

                Speaking Stata is written by me (with occasional co-authors).

                Comment

                Working...
                X