Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Drop observations by group size

    I have the following data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input matcheducage
     1
     2
     3
     4
     5
     6
     7
     7
     7
     7
     7
     8
     9
     9
    10
    11
    12
    12
    13
    14
    end
    I would like to drop the groups from my dataset which have only one observation, i.e. in this case groups 1,2,3,4,5,6,8 and so on.

    Would you have any idea on how I could code this?

    Thank you very much in advance for your suggestions!

    Best,

    Max

  • #2
    Assuming the variable whose values range from 1 to 14 in your example is called "group", I would use:

    gen help=group!=.
    egen help2=sum(help)
    drop if help2==1
    drop help help2

    Comment


    • #3
      Max:
      you may want to try:
      Code:
      duplicates tag matcheducage, g(flag) 
      drop if flag==0
      Kind regards,
      Carlo
      (StataNow 18.5)

      Comment


      • #4
        Code:
          
        input matcheducage
         1
         2
         3
         4
         5
         6
         7
         7
         7
         7
         7
         8
         9
         9
        10
        11
        12
        12
        13
        14
        end
        
        bysort matcheducage : drop if _N == 1
        Note that Sebastian's code won't work. The constructed variable help2 is constant across the dataset and can't possibly discriminate between singletons and duplicates.

        Carlo's code would work.
        Last edited by Nick Cox; 27 Nov 2015, 07:01.

        Comment


        • #5
          Thanks for the correction Nick. I forgot the most important part:

          gen help=group!=.
          bys group: egen help2=sum(help)
          drop if help2==1
          drop help help2

          Comment


          • #6
            Sebastian: Your strategy clearly makes sense. It could be simplified thus:

            Code:
             
            bys group: egen help2 = count(group)
            drop if help2 == 1
            drop help2
            You are counting non-missing values. But doing that by group: automatically segregates the missing values of group and so the code need not be fastidious about ignoring them.

            If there are missing values of group then your help2 is 0 and those values will not be dropped.

            So your code is essentially equivalent to my code.

            (There is further small print to add if there are system or extended missing values.)

            Comment


            • #7
              Thank you very much for the quick replies! I ran Carlo's code and it worked out fine!

              Best,
              ​Max

              Comment

              Working...
              X