Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Dropping Duplicates and Keeping Non-Duplicates

    Dear all,

    I am working on data that has two duplicate ids and others are not duplicates. I would like drop the second observation for each duplicate id but maintain the observation where there ids are not duplicates. Please help me. Here is the sample data

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int household_id
     1
     1
     2
     2
     3
     3
     4
     4
     5
     5
     6
     6
     7
     7
     8
     9
     9
    10
    10
    11
    end

  • #2
    I would simply use the code:

    Code:
    duplicates drop household_id

    Comment


    • #3
      Hi Chris,
      Could you try if this works,

      Code:
      duplicates drop pid, force
      where "pid" is the variable name capturing the id... so change it to fit the variable name you are using.

      Comment


      • #4
        Code:
        duplicates drop household_id , force
        preserves singletons, but it's hard to believe that's a good idea. Check that you are not losing data on other variables.

        Comment


        • #5
          Hi Nick and George. I would like to drop if gender id ==2 for the duplicates.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input int household_id byte gender_id
           1 1
           1 2
           2 1
           2 2
           3 1
           3 2
           4 2
           4 1
           5 2
           5 1
           6 1
           6 2
           7 2
           7 1
           8 2
           9 1
           9 2
          10 1
          10 2
          11 1
          end
          label values gender_id GENDER_ID
          label def GENDER_ID 1 "Male", modify
          label def GENDER_ID 2 "Female", modify

          Comment


          • #6
            Code:
            sort household_id gender_id
            egen tag = tag(household_id)
            drop if tag == 0
            This is one way to do it, note that the sort is essential. How it works: tag will tag the first observation within household_id, which will be the male (1) if there is one, or else the female (2). Dropping those that aren't tagged ensures you always have the males if they are present and the females if they aren't.

            PS: This is a bit of a "hack", there are better ways to do this, I'm sure someone will post a less error-prone version.

            Comment


            • #7
              duplicates doesn't support that, its logic being that observations that differ in some respect are not true duplicates. Or rather it supports if but observations not qualifying are ignored. So, you can't compare different genders.

              What if there are 2 females in the same household?

              Study this to see if it is what you want.

              Code:
              bysort household_id (gender_id) : drop if _N == 2 &  _n == 2 & gender_id == 2

              Comment


              • #8
                Thanks all. Nick the data has only two decision makers (male and female). It doesn't have all the household members.

                Comment


                • #9
                  Originally posted by Chris Miyinzi View Post
                  Thanks all. Nick the data has only two decision makers (male and female). It doesn't have all the household members.
                  There are no same-sex couples?

                  Comment


                  • #10
                    Yes the data does not have same sex couples

                    Comment


                    • #11
                      Chris, do you have anything to crucify that assumption that no two females live together? I think Cox's argument and that of Jesse is highly valid.

                      Comment


                      • #12
                        Hi George, The survey was conducted for couples only. so i would like to keep information for one couple only. We did not collect information for other household members.

                        Comment

                        Working...
                        X