Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dataset with multiple observations per ID, how to count how many individual IDs?

    Hi!

    I'm fairly new to Stata and have tried my best to research the forum on this topic, however without any luck, so I thought I'd try write a post. I'm sorry if this is too basic.

    I have a large dataset with about 400,000 observations. Each patient is recorded in multiple observations (due to clinical follow ups). I would like to calculate how many unique IDs (aka patients) there is in the dataset. Any ideas?
    Last edited by Vilma Antonov; 24 Aug 2022, 09:24.

  • #2
    Vilma:
    welcome to this forum.
    You may want to consider something along the following lines:
    Code:
    use "https://www.stata-press.com/data/r17/nlswork.dta"
    . tab idcode if idcode<=5
    
         NLS ID |      Freq.     Percent        Cum.
    ------------+-----------------------------------
              1 |         12       19.67       19.67
              2 |         12       19.67       39.34
              3 |         15       24.59       63.93
              4 |         11       18.03       81.97
              5 |         11       18.03      100.00
    ------------+-----------------------------------
          Total |         61      100.00
    
    .
    Another option might be -collapse-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Nobuya Fukugawa started a thread with essentially the same question earlier today.

      Comment


      • #4
        In addition to the possibilities suggested in #2, there is -distinct-, by Gary Longton & Nick Cox, available from SSC. And there is -egen, nvals()- in the -egenmore- package, by Nick Cox, also available from SSC.

        Another approach is:
        [/code]
        by id, sort: gen long counter = sum(id != id[_n-1])
        [/code]
        which assigns consecutive integers starting from 1 to each distinct id. Then counter[_N] will be the number of distinct id's in the data set.

        Comment


        • #5
          Thank you all so much for your suggestions! I didn't manage to make Carlo's suggestion work (I think the user -me- is to blame), however, -distinct- worked out!

          I tried what Clyde suggested in #4, however, the new variable came out to 1 for each distinct ID... I don't really know how come, since they are not the same individual IDs.

          Comment


          • #6
            My error in #4. It should be:
            Code:
            sort id
            gen long counter = sum(id != id[_n-1])
            Sorry about that.

            Comment


            • #7
              It depends exactly in what context, but if you just want the number,
              Code:
              u "http://fmwww.bc.edu/repec/bocode/s/scul_Reunification.dta", clear
              
              cls
              qui xtset
              qui insp `r(panelvar)'
              
              di r(N_unique)

              Comment


              • #8
                Hi!

                I'm new to Stata! I have a database model as shown below (hypothetical data). When running Prais-winsten regression, I would like to store the p-values and confidence intervals in a new variable.

                The command “ statsby, by(id): prais log10_prevame year ”, stores only the beta values in a new variable. How do I store the p-value and CI as well?

                I also tested the regsave command (below), but it only stores the values of the last regression of prais-wisten and not the set of regressions by id (my database has approx 2000 id).

                by(id): prais log10_prevame year
                regsave, tstat pval ci

                Click image for larger version

Name:	stata.png
Views:	2
Size:	100.4 KB
ID:	1679167



                I thank the help of all you.



                Comment


                • #9
                  #8 is wildly off-topic for this thread. Posts in this Forum are not simply dialogs between a questioner and responder(s). Other people come to this Forum from time to time and search for already-existing answers to their questions. Also, those who respond to questions use the thread titles to decide which posts to read. So when threads go off topic, many people's time gets wasted.

                  Please repost in a new thread. When you do that, also please heed the advice in the Forum FAQ (especially #12) pointing out that screenshots of data are not helpful, and recommending, instead, the use of the -dataex- command for showing example data. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

                  Comment


                  • #10
                    I think the easiest way to do what OP wants is

                    Code:
                    egen tag = tag(id)
                    
                    count if tag

                    Comment


                    • #11
                      I am so sorry for missing #6, #7 and #10, all great suggestions! Thanks a lot!

                      Comment

                      Working...
                      X