Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deleting duplicates only for one column and keep the rest the same

    Hi all,

    I would like to know what is the shortest way (and possibly whether there is a command) that enables one to drop duplicates present in one column, while keeping the rest of the datatest the same.
    Here is what my dataset looks like at the moment:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long senator_id float(senators_ideology1 n_counts)
    1   -.5332484  8
    1    .3678178  8
    1   4.4572864  8
    1    1.907662  8
    1    1.651716  8
    1    38.13142  8
    1    40.69221  8
    1   -2.500967  8
    2   -4.860859  6
    2    -.970969  6
    2    39.88372  6
    2   28.558784  6
    2 -.034183454  6
    2   34.688347  6
    3    42.39945 18
    3     37.4416 18
    3    39.82991 18
    3    41.42913 18
    3  -1.2361712 18
    3    1.414266 18
    end
    label values senator_id senator_id
    label def senator_id 1 "a000109", modify
    label def senator_id 2 "a000121", modify
    label def senator_id 3 "a000360", modify
    Here instead is what I would like to achieve:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input long senator_id float(senators_ideology1 n_counts)
    1   -.5332484  8
    1    .3678178  6
    1   4.4572864  18
    1    1.907662
    1    1.651716
    1    38.13142
    1    40.69221
    1   -2.500967
    2   -4.860859
    2    -.970969
    2    39.88372
    2   28.558784
    2 -.034183454
    2   34.688347
    3    42.39945
    3     37.4416
    3    39.82991
    3    41.42913
    3  -1.2361712
    3    1.414266
    end
    label values senator_id senator_id
    label def senator_id 1 "a000109", modify
    label def senator_id 2 "a000121", modify
    label def senator_id 3 "a000360", modify
    And then what I would like to do is to expand the senators_ideology variable vertically as many times as the n_counts variable says. Does anyone has any idea on how to do this? I hope the example is sufficiently clear.
    Many thanks in advance for your help and support!

  • #2
    Unsure about the rationale behind, but this is one way to achieve it:

    Code:
    tempfile t1
    save `t1'
    
    collapse (mean) to_expand = n_counts, by(senator_id)
    drop senator_id
    merge 1:1 _n using `t1', nogen

    Comment


    • #3
      Code:
      bys senator_id: replace n_counts = . if _n!=1

      Comment


      • #4
        or just mark them:

        Code:
         
         bys senator_id: g first =  _n==1

        Comment


        • #5
          This sounds like a complicated way to achieve something, but what you are trying to do is unclear (see XY problem). Could you tell us what problem you are trying to solve by this algorithm?

          Comment

          Working...
          X