Deleting duplicates only for one column and keep the rest the same

Francesco Tucci

Join Date: Aug 2022
Posts: 13

Deleting duplicates only for one column and keep the rest the same

19 Jun 2024, 03:58

Hi all,

I would like to know what is the shortest way (and possibly whether there is a command) that enables one to drop duplicates present in one column, while keeping the rest of the datatest the same.
Here is what my dataset looks like at the moment:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long senator_id float(senators_ideology1 n_counts)
1   -.5332484  8
1    .3678178  8
1   4.4572864  8
1    1.907662  8
1    1.651716  8
1    38.13142  8
1    40.69221  8
1   -2.500967  8
2   -4.860859  6
2    -.970969  6
2    39.88372  6
2   28.558784  6
2 -.034183454  6
2   34.688347  6
3    42.39945 18
3     37.4416 18
3    39.82991 18
3    41.42913 18
3  -1.2361712 18
3    1.414266 18
end
label values senator_id senator_id
label def senator_id 1 "a000109", modify
label def senator_id 2 "a000121", modify
label def senator_id 3 "a000360", modify

Here instead is what I would like to achieve:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long senator_id float(senators_ideology1 n_counts)
1   -.5332484  8
1    .3678178  6
1   4.4572864  18
1    1.907662
1    1.651716
1    38.13142
1    40.69221
1   -2.500967
2   -4.860859
2    -.970969
2    39.88372
2   28.558784
2 -.034183454
2   34.688347
3    42.39945
3     37.4416
3    39.82991
3    41.42913
3  -1.2361712
3    1.414266
end
label values senator_id senator_id
label def senator_id 1 "a000109", modify
label def senator_id 2 "a000121", modify
label def senator_id 3 "a000360", modify

And then what I would like to do is to expand the senators_ideology variable vertically as many times as the n_counts variable says. Does anyone has any idea on how to do this? I hope the example is sufficiently clear.
Many thanks in advance for your help and support!

Tags: None

Ken Chui

Join Date: Aug 2014

Posts: 1063
#2

19 Jun 2024, 09:25

Unsure about the rationale behind, but this is one way to achieve it:

Code:

tempfile t1 save `t1' collapse (mean) to_expand = n_counts, by(senator_id) drop senator_id merge 1:1 _n using `t1', nogen
Comment
George Ford

Join Date: Aug 2014

Posts: 3337
#3

19 Jun 2024, 11:06

Code:

bys senator_id: replace n_counts = . if _n!=1
Comment
George Ford

Join Date: Aug 2014

Posts: 3337
#4

19 Jun 2024, 11:08

or just mark them:

Code:

bys senator_id: g first = _n==1
Comment
Hemanshu Kumar

Join Date: Mar 2015

Posts: 1548
#5

24 Jun 2024, 02:48

This sounds like a complicated way to achieve something, but what you are trying to do is unclear (see XY problem). Could you tell us what problem you are trying to solve by this algorithm?
1 like
Comment

Announcement

Deleting duplicates only for one column and keep the rest the same

Comment

Comment

Comment

Comment