Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tabulating distinct counts of an ID by group

    I have the data in dataex below, I just want a table that is number of unique cID by group
    so like this (given the dataex below):
    Group 1 Group 2 Group 3 Group 4
    distinct cID 2 1 2 1

    There are many more cID in the data, from using codebook group 2 has 33 distinct cID values in my actual data, but I would prefer to have a convenient way to tabulate this without having to do codebook since there will be many different stratification.
    Thanks



    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float cID long group
    1 1
    1 3
    1 3
    1 3
    1 3
    1 3
    1 1
    1 3
    1 3
    1 3
    2 4
    2 3
    2 2
    2 2
    2 2
    2 2
    2 3
    2 2
    2 3
    2 2
    2 3
    2 2
    2 2
    2 2
    2 2
    2 3
    2 2
    2 3
    2 2
    2 2
    2 3
    2 2
    2 2
    2 2
    2 2
    2 2
    2 2
    2 3
    2 3
    2 3
    2 4
    2 3
    2 3
    2 3
    2 2
    2 1
    2 2
    2 3
    2 2
    2 3
    2 3
    2 3
    2 3
    2 4
    2 2
    2 2
    2 3
    2 2
    2 2
    2 3
    2 2
    2 2
    2 3
    2 2
    2 3
    2 3
    2 2
    2 2
    2 2
    2 3
    2 3
    2 2
    2 2
    2 3
    2 4
    2 2
    2 2
    2 3
    2 2
    2 2
    2 2
    2 2
    2 3
    2 3
    2 3
    2 2
    2 2
    2 2
    2 2
    2 3
    2 2
    2 2
    2 3
    2 2
    2 1
    2 2
    2 3
    2 2
    2 3
    2 3
    end

  • #2
    This example may help:

    Code:
    . egen tag = tag(cID group)
    
    . egen distinct = total(tag), by(group)
    
    . tabdisp group, c(distinct)
    
    ----------------------
        group |   distinct
    ----------+-----------
            1 |          2
            2 |          1
            3 |          2
            4 |          1
    ----------------------
    See e.g.

    SJ-15-3 dm0042_2 . . . . . . . . . . . . . . . . Software update for distinct
    (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
    Q3/15 SJ 15(3):899
    improved table format and display of large numbers of
    observations

    SJ-12-2 dm0042_1 . . . . . . . . . . . . . . . . Software update for distinct
    (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
    Q2/12 SJ 12(2):352
    options added to restrict output to variables with a minimum
    or maximum of distinct values

    SJ-8-4 dm0042 . . . . . . . . . . . . Speaking Stata: Distinct observations
    (help distinct if installed) . . . . . . N. J. Cox and G. M. Longton
    Q4/08 SJ 8(4):557--568
    shows how to answer questions about distinct observations
    from first principles; provides a convenience command

    dm0042 is thus revealed as an otherwise unpredictable search terms for many similar questions here,
    Last edited by Nick Cox; 03 Nov 2019, 08:45.

    Comment


    • #3
      That worked, thanks Nick

      Comment


      • #4
        Actually, that gave the distinct numbers but I can't subset on other variables in the data, for instance if I want to stratify by year I get the same number in each group (even thought there should be fewer IDs.

        Comment


        • #5
          #2 shows the principle. If you want separate results by year, modify the code accordingly.

          Comment


          • #6
            ... meaning something like

            Code:
            egen tag = tag(cID group year)  
            egen distinct = total(tag), by(group year)  
            tabdisp year group, c(distinct)

            Comment


            • #7
              Thanks, that worked!

              Comment

              Working...
              X