Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Counting distinct observations across time

    Dear all,

    Suppose I have the following dataset.

    input str8 groupid str8 memberid year
    "A000" "B000" 1980
    "A000" "B001" 1980
    "A000" "B002" 1980
    "A000" "B001" 1981
    "A000" "B002" 1981
    "A000" "B003" 1981
    "A000" "B004" 1981
    "A000" "B000" 1982
    "A000" "B003" 1982
    "A000" "B005" 1982
    "A000" "B004" 1983
    "A000" "B005" 1983
    "A000" "B002" 1984
    "A000" "B005" 1985
    "A000" "B006" 1985
    "A001" "B015" 1984
    "A001" "B018" 1985
    "A001" "B019" 1985
    "A001" "B017" 1986
    "A001" "B019" 1986
    "A002" "B000" 1980
    "A002" "B000" 1981
    "A002" "B005" 1981
    end

    I'm struggling to compute the number of distinct members for each group over the last 5 years.

    Could anyone please help me with this ?

    Thank you very much for your help,

    Vinh

  • #2
    Hi Vinh,

    The simplest way would be to use codebook (see help codebook) - it returns the number of unique values in its output

    e.g.

    codebook memberid if year==1980

    if you have a range of values for year you could put this in a loop - so for example your data ranges from 1980 to 1986 so you could use:

    forval x=1980/1986 {

    display _column(20) `x'
    codebook memberid if year==`x'

    }


    I hope this helps

    Robert

    Comment


    • #3
      Assuming your time/spell variable is already in a way that does not need further work, you may use the following:
      Code:
      clear
      input str8 groupid str8 memberid year
       "A000" "B000" 1980
       "A000" "B001" 1980
       "A000" "B002" 1980
       "A000" "B001" 1981
       "A000" "B002" 1981
       "A000" "B003" 1981
       "A000" "B004" 1981
       "A000" "B000" 1982
       "A000" "B003" 1982
       "A000" "B005" 1982
       "A000" "B004" 1983
       "A000" "B005" 1983
       "A000" "B002" 1984
       "A000" "B005" 1985
       "A000" "B006" 1985
       "A001" "B015" 1984
       "A001" "B018" 1985
       "A001" "B019" 1985
       "A001" "B017" 1986
       "A001" "B019" 1986
       "A002" "B000" 1980
       "A002" "B000" 1981
       "A002" "B005" 1981
       end
      
      
      by groupid memberid, sort: gen nvals1 = _n == 1
      by groupid, sort: replace nvals1 = sum(nvals1)
      by groupid, sort: replace nvals1 = nvals1[_N]
      However, if you need further streamlinening of the spell variable (i.e. year), then you may have a look at this FAQ: https://www.stata.com/support/faqs/d...-observations/ particularly, the part on tsspel (SSC).

      Comment


      • #4
        If I understand this correctly, this question was also asked recently. See https://www.statalist.org/forums/for...nge-by-inrange and a solution there to use rangerun from SSC.

        The problem is (1) counting distinct values (2) by groups (3) within moving windows. I don't think solutions in #2 or #3 match all of those.

        Last 5 years I take to mean (this year or the previous four).


        Code:
        clear
        input str8 groupid str8 memberid year
        "A000" "B000" 1980
        "A000" "B001" 1980
        "A000" "B002" 1980
        "A000" "B001" 1981
        "A000" "B002" 1981
        "A000" "B003" 1981
        "A000" "B004" 1981
        "A000" "B000" 1982
        "A000" "B003" 1982
        "A000" "B005" 1982
        "A000" "B004" 1983
        "A000" "B005" 1983
        "A000" "B002" 1984
        "A000" "B005" 1985
        "A000" "B006" 1985
        "A001" "B015" 1984
        "A001" "B018" 1985
        "A001" "B019" 1985
        "A001" "B017" 1986
        "A001" "B019" 1986
        "A002" "B000" 1980
        "A002" "B000" 1981
        "A002" "B005" 1981
        end
        
        egen numericid = group(memberid)
        
        program myprog
            quietly tab numericid
            gen ndistinct = r(r)
        end
        
        rangerun myprog, use(numericid) interval(year -4 0) by(groupid)
        
        list, sepby(groupid year)
            
             +-------------------------------------------------+
             | groupid   memberid   year   numeri~d   ndisti~t |
             |-------------------------------------------------|
          1. |    A000       B000   1980          1          3 |
          2. |    A000       B001   1980          2          3 |
          3. |    A000       B002   1980          3          3 |
             |-------------------------------------------------|
          4. |    A000       B001   1981          2          5 |
          5. |    A000       B002   1981          3          5 |
          6. |    A000       B003   1981          4          5 |
          7. |    A000       B004   1981          5          5 |
             |-------------------------------------------------|
          8. |    A000       B000   1982          1          6 |
          9. |    A000       B003   1982          4          6 |
         10. |    A000       B005   1982          6          6 |
             |-------------------------------------------------|
         11. |    A000       B004   1983          5          6 |
         12. |    A000       B005   1983          6          6 |
             |-------------------------------------------------|
         13. |    A000       B002   1984          3          6 |
             |-------------------------------------------------|
         14. |    A000       B005   1985          6          7 |
         15. |    A000       B006   1985          7          7 |
             |-------------------------------------------------|
         16. |    A001       B015   1984          8          1 |
             |-------------------------------------------------|
         17. |    A001       B018   1985         10          3 |
         18. |    A001       B019   1985         11          3 |
             |-------------------------------------------------|
         19. |    A001       B017   1986          9          4 |
         20. |    A001       B019   1986         11          4 |
             |-------------------------------------------------|
         21. |    A002       B000   1980          1          1 |
             |-------------------------------------------------|
         22. |    A002       B000   1981          1          2 |
         23. |    A002       B005   1981          6          2 |
             +-------------------------------------------------+
        If you mean year by year, the interval becomes year 0 0

        Comment


        • #5
          Hi Robert and Abdul,

          Thank you very much for your help. Sorry, I should have explained a bit more.

          In this case, I'd like to count the number of distinct members over the period of past 5 years. For example, in year 1985, I'd like to compute how many distinct members there are from 1981 to 1985. Also there are time gaps in my panel data.

          Many thanks for your help,

          Vinh

          Comment


          • #6
            Originally posted by Nick Cox View Post
            If I understand this correctly, this question was also asked recently. See https://www.statalist.org/forums/for...nge-by-inrange and a solution there to use rangerun from SSC.

            The problem is (1) counting distinct values (2) by groups (3) within moving windows. I don't think solutions in #2 or #3 match all of those.

            Last 5 years I take to mean (this year or the previous four).
            Thank you very much for your help, Nick. That's exactly what I'd like to have.


            Comment

            Working...
            X