Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting the number of consecutive same observations in panel data

    Dear all,
    Hi, I have a panel data (CRSP mutual fund database), where crsp_fundno identifies a mutual fund and qdatev identifies the quarterly observation time. I am interested in expense ratio (exp_ratio) data point. For example, the following sample lists expense ratio for mutual fund crsp_fundno==53 from 3rd quarter of 1992 to end of 1994
    +------------------------------+
    | crsp_fundno qdatev exp_ratio | WANTED VARIABLE
    |------------------------------|
    67. | 53 1992q3 .0091 | 2
    68. | 53 1992q4 .0091 | 2
    69. | 53 1993q1 .0084 | 4
    70. | 53 1993q2 .0084 | 4
    71. | 53 1993q3 .0084 | 4
    |------------------------------|
    72. | 53 1993q4 .0084 | 4
    73. | 53 1994q1 .0072 | 12
    74. | 53 1994q2 .0072 | 12
    75. | 53 1994q3 .0072 | 12
    76. | 53 1994q4 .0072 | 12
    |------------------------------|
    77. | 53 1995q1 .0072 | 12
    78. | 53 1995q2 .0072 | 12
    79. | 53 1995q3 .0072 | 12
    80. | 53 1995q4 .0072 | 12
    81. | 53 1996q1 .0072 | 12
    |------------------------------|
    82. | 53 1996q2 .0072 | 12
    83. | 53 1996q3 .0072 | 12
    84. | 53 1996q4 .0072 | 12
    85. | 53 1997q1 .0071 | 4
    86. | 53 1997q2 .0071 | 4
    |------------------------------|
    87. | 53 1997q3 .0071 | 4
    88. | 53 1997q4 .0071 | 4
    89. | 53 1998q1 . | 2
    90. | 53 1998q2 . | 2
    91. | 53 1998q3 .0067 | 4
    |------------------------------|
    92. | 53 1998q4 .0067 | 4
    93. | 53 1999q1 .0067 | 4
    94. | 53 1999q2 .0067 | 4
    95. | 53 1999q3 .0076 | 2
    96. | 53 1999q4 .0076 | 2

    One can note that exp_ratio is constant for few quarters before jumping to a new value (or turn missing). I would like to create a variable that records the length of time a particular value of exp_ratio is relevant; essentially, starting from the top .0091 appears twice, so record 2, then .0084 appears four times, so record 4, etc. In the example above, I manually counted and created the WANTED VARIABLE, which has desired properties.
    I'm quite lost as to how to implement this WANTED VARIABLE in Stata, however.. I'd be really grateful for any and all suggestions. Thank you very much in advance!!

    Best,
    John

  • #2
    John,

    Speaking Stata: Identifying spells by Nicholas J. Cox (The Stata Journal, Volume 7 Number 2) is an excellent reference.
    You should:

    1. Read the FAQ carefully.

    2. "Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!"

    3. Describe your dataset. Use list to list data when you are doing so. Use input to type in your own dataset fragment that others can experiment with.

    4. Use the advanced editing options to appropriately format quotes, data, code and Stata output. The advanced options can be toggled on/off using the A button in the top right corner of the text editor.

    Comment


    • #3
      tsspell from SSC is an implementation of the principles in the paper mentioned by Roberto Ferrer.

      Code:
      ssc inst tsspell
      tsset crsp_fundno qdatev
      tsspell exp_ratio
      bysort crsp_fundno _spell : gen length = _N

      Comment


      • #4
        Dear Nick and Roberto,
        Thank you so much for your help! It works out really well. Can I ask one more question if possible? I used Nick's code and for example, I get

        +----------------------------------------------+
        | crsp_fundo year qdatev exp_ratio length |
        |----------------------------------------------|
        41113. | 4785 1995 1995q1 .0099 7 | should be 4
        41114. | 4785 1995 1995q2 .0099 7 | should be 4
        41115. | 4785 1995 1995q3 .0099 7 | should be 4
        41116. | 4785 1995 1995q4 .0099 7 | should be 4
        41117. | 4785 1998 1998q1 . 7 | should be 3
        |----------------------------------------------|
        41118. | 4785 1998 1998q2 . 7 | should be 3
        41119. | 4785 1998 1998q3 . 7 | should be 3
        41120. | 4785 1998 1998q4 .0098 4 |
        41121. | 4785 1999 1999q1 .0098 4 |
        41122. | 4785 1999 1999q2 .0098 4 |
        |----------------------------------------------|
        41123. | 4785 1999 1999q3 .0098 4 |
        41124. | 4785 1999 1999q4 .0099 4 |
        41125. | 4785 2000 2000q1 .0099 4 |
        41126. | 4785 2000 2000q2 .0099 4 |
        41127. | 4785 2000 2000q3 .0099 4 |
        |----------------------------------------------|
        41128. | 4785 2000 2000q4 .0124 5 |
        41129. | 4785 2001 2001q1 .0124 5 |
        41130. | 4785 2001 2001q2 .0124 5 |
        41131. | 4785 2001 2001q3 .0124 5 |
        41132. | 4785 2001 2001q4 .0124 5 |
        +----------------------------------------------+

        One can see that length does not identify the length of spell ".0099" or . (missing value). This appears to happen quite a number of times (enough that I can't fix it manually).
        I'm guessing that this is happening because there is a gap in time (time jumps from 4th quarter, 1995 to 1st quarter, 1998).
        Is there a way to make the code ignore this time gap? Ideally, the length should read 4 for the first 4 observations and then 3 for the next 3 observations.
        Thank you very much in advance as always!

        Best,
        John

















        Comment


        • #5

          Why not just replace the missing values with a large number (9999, say) and then replace them with missing after running tsspell

          Code:
          .
          . list, abb(12)
          
               +----------------------------------+
               | crsp_fundno   qdatev   exp_ratio |
               |----------------------------------|
            1. |        4785   1995q1       .0099 |
            2. |        4785   1995q2       .0099 |
            3. |        4785   1995q3       .0099 |
            4. |        4785   1995q4       .0099 |
            5. |        4785   1998q1           . |
               |----------------------------------|
            6. |        4785   1998q2           . |
            7. |        4785   1998q3           . |
            8. |        4785   1998q4       .0098 |
            9. |        4785   1999q1       .0098 |
           10. |        4785   1999q2       .0098 |
               |----------------------------------|
           11. |        4785   1999q3       .0098 |
           12. |        4785   1999q4       .0099 |
           13. |        4785   2000q1       .0099 |
           14. |        4785   2000q2       .0099 |
           15. |        4785   2000q3       .0099 |
               |----------------------------------|
           16. |        4785   2000q4       .0124 |
           17. |        4785   2001q1       .0124 |
           18. |        4785   2001q2       .0124 |
           19. |        4785   2001q3       .0124 |
           20. |        4785   2001q4       .0124 |
               +----------------------------------+
          
          .
          . tsset crsp_fundno qdatev
                 panel variable:  crsp_fundno (strongly balanced)
                  time variable:  qdatev, 1995q1 to 2001q4, but with a gap
                          delta:  1 quarter
          
          . tsspell exp_ratio
          
          . bysort crsp_fundno _spell : gen length = _N
          
          . drop _spell _seq _end
          
          . list, abb(12)
          
               +-------------------------------------------+
               | crsp_fundno   qdatev   exp_ratio   length |
               |-------------------------------------------|
            1. |        4785   1995q1       .0099        7 |
            2. |        4785   1995q2       .0099        7 |
            3. |        4785   1995q3       .0099        7 |
            4. |        4785   1995q4       .0099        7 |
            5. |        4785   1998q1           .        7 |
               |-------------------------------------------|
            6. |        4785   1998q2           .        7 |
            7. |        4785   1998q3           .        7 |
            8. |        4785   1998q4       .0098        4 |
            9. |        4785   1999q1       .0098        4 |
           10. |        4785   1999q2       .0098        4 |
               |-------------------------------------------|
           11. |        4785   1999q3       .0098        4 |
           12. |        4785   1999q4       .0099        4 |
           13. |        4785   2000q1       .0099        4 |
           14. |        4785   2000q2       .0099        4 |
           15. |        4785   2000q3       .0099        4 |
               |-------------------------------------------|
           16. |        4785   2000q4       .0124        5 |
           17. |        4785   2001q1       .0124        5 |
           18. |        4785   2001q2       .0124        5 |
           19. |        4785   2001q3       .0124        5 |
           20. |        4785   2001q4       .0124        5 |
               +-------------------------------------------+
          
          .
          . replace exp_ratio = 9999 if exp_ratio==.
          (3 real changes made)
          
          . tsset crsp_fundno qdatev
                 panel variable:  crsp_fundno (strongly balanced)
                  time variable:  qdatev, 1995q1 to 2001q4, but with a gap
                          delta:  1 quarter
          
          . tsspell exp_ratio
          
          . bysort crsp_fundno _spell : gen length2 = _N
          
          .
          . drop _spell _seq _end
          
          . replace exp_ratio = . if exp_ratio==9999
          (3 real changes made, 3 to missing)
          
          . list, abb(12)
          
               +-----------------------------------------------------+
               | crsp_fundno   qdatev   exp_ratio   length   length2 |
               |-----------------------------------------------------|
            1. |        4785   1995q1       .0099        7         4 |
            2. |        4785   1995q2       .0099        7         4 |
            3. |        4785   1995q3       .0099        7         4 |
            4. |        4785   1995q4       .0099        7         4 |
            5. |        4785   1998q1           .        7         3 |
               |-----------------------------------------------------|
            6. |        4785   1998q2           .        7         3 |
            7. |        4785   1998q3           .        7         3 |
            8. |        4785   1998q4       .0098        4         4 |
            9. |        4785   1999q1       .0098        4         4 |
           10. |        4785   1999q2       .0098        4         4 |
               |-----------------------------------------------------|
           11. |        4785   1999q3       .0098        4         4 |
           12. |        4785   1999q4       .0099        4         4 |
           13. |        4785   2000q1       .0099        4         4 |
           14. |        4785   2000q2       .0099        4         4 |
           15. |        4785   2000q3       .0099        4         4 |
               |-----------------------------------------------------|
           16. |        4785   2000q4       .0124        5         5 |
           17. |        4785   2001q1       .0124        5         5 |
           18. |        4785   2001q2       .0124        5         5 |
           19. |        4785   2001q3       .0124        5         5 |
           20. |        4785   2001q4       .0124        5         5 |
               +-----------------------------------------------------+

          Comment


          • #6
            Dear Kieran,
            Yes, I just realized a similar method to deal with the problem, too (using tsfill)! Thank you very much for your help!

            Best,
            John

            Comment


            • #7
              John has found a bug in tsspell (SSC). The logic it is using is that it is looking for new spells with the criterion

              Code:
                    
                    (`varlist' != L.`varlist') | (_n == 1)
              but when a gap is followed by missings, the first value after a gap is missing and the lagged value is also evaluated as missing (by virtue of the gap) and the program considers itself within the same spell as before.

              The easiest work-around with tsspell is to tsfill first.

              What the attitude should be about gaps and spells is a little open, although the result here is indeed incorrect.

              Comment


              • #8
                Note that there is already some advice in the help for tsspell on dealing with gaps.

                But I thought some more about this kind of example. I considered the possibility that tsspell reject data with gaps but that would take away its scope for some useful applications.

                For the moment a fairly easy solution for John is to spell out (*) a criterion for a new spell,

                Code:
                 tsspell , fcond(exp_ratio != L.exp_ratio | qdatev != L.qdatev + 1)
                namely that a new spell begins if the value changes or after a gap.

                Another solution is to define a sequence number and tsset in terms of that

                Code:
                bysort crsp (qdatev) : gen seq = _n
                tsset crsp seq
                and then apply tsspell. That way, tsspell ignores gaps. But for other applications you need to re-apply tsset.

                I will post new code (with a warning on gaps) and a revised help file to SSC.

                (*) All puns should be considered intentional.
                Last edited by Nick Cox; 11 May 2014, 04:11.

                Comment

                Working...
                X