Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generate list of unique IDs based on missingness

    Hi there Statalist. I have panel data and am trying to produce a list of IDs that have at least one wave with complete (i.e., no missing) data for var1 and var2 and var3. So in this below data example, id1 and id3 would both be tagged because id1 has no missingness on those vars in wave 3, and id3 has no missingness in waves 1 and 3. Ideally this would get tagged in a way that lets me use -browse- to copy this list of IDs (which will be in the thousands in my real data).

    Code:
    clear all
    input id wave var1 var2 var3
    1 1 1 . 1
    1 2 0 1 .
    1 3 0 1 0
    2 1 0 1 .
    2 2 . . 1
    2 3 1 0 .
    3 1 1 0 1
    3 2 . 0 1
    3 3 2 0 2
    end

  • #2
    Code:
    egen mcount = rowmiss(var1 var2 var3)
    by id (wave), sort: egen n_complete_cases = total(mcount == 0)
    by id (wave): gen byte tag = (n_complete_cases > 0 & _n == 1)
    browse id if tag

    Comment


    • #3
      Functions are your friends.

      For any id and wave, !missing(var1, var2, var3) is true or 1 if all variables are not missing and false or 0 otherwise.

      So, if we add that expression across each panel, the result will be positive (1, 2, or 3) if true in that many waves and 0 otherwise.

      Code:
      clear all
      input id wave var1 var2 var3
      1 1 1 . 1
      1 2 0 1 .
      1 3 0 1 0
      2 1 0 1 .
      2 2 . . 1
      2 3 1 0 .
      3 1 1 0 1
      3 2 . 0 1
      3 3 2 0 2
      end
      
      egen wanted = total(!missing(var1, var2, var3)), by(id)
      
      egen tag = tag(id)
      
      edit if wanted
      
      list id if wanted & tag
      
           +----+
           | id |
           |----|
        1. |  1 |
        7. |  3 |
           +----+
      Several vital Stata features are useful here, not least dealing with true and false.

      Code:
      FAQ     . . . . . . . . . . . . . . . . . . . . . . .  True and false in Stata
              . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
              8/05    What is true and false in Stata?
                      http://www.stata.com/support/faqs/data-management/
                      true-and-false/
      
      SJ-19-1 dm0099  . . . . . .  How best to generate indicator or dummy variables
              . . . . . . . . . . . . . . . . . . . .  N. J. Cox and C. B. Schechter
              Q1/19   SJ 19(1):246--259                                (no commands)
              discusses how to best generate indicator or dummy variables
      
      SJ-16-1 dm0087  . . . Speaking Stata: Truth, falsity, indication, and negation
              . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
              Q1/16   SJ 16(1):229--236                                (no commands)
              looks at the following concepts: indicator variables, by: for
              groupwise calculations, and control of sort order to enable
              exactly what you want

      Code:
      help egen
      tells you about other functions to count missing values that are especially helpful if the number of variables is really (much?) larger than 3. (EDIT: One such is used in @Clyde Schechter's answer #2, not visible to me until I had posted.)

      For an opinionated riff on why you should say and write distinct not unique see Section 2 of https://journals.sagepub.com/doi/pdf...867X0800800408
      Last edited by Nick Cox; 28 Oct 2024, 11:55.

      Comment


      • #4
        Thanks both. Nick Cox , one follow-up question: what is the necessity of calling up the data editor if wanted (via -edit if wanted-)?

        Comment


        • #5
          There is no necessity. It is just possible and may be useful.

          Comment

          Working...
          X