How to find number of unique values in one variables for a given value in another

Markos Valsamis

Join Date: Jan 2022

Posts: 43
#1

How to find number of unique values in one variables for a given value in another

27 Jan 2022, 11:39

I am having trouble working out how to deduce an association between two identifier variables. I will explain this with an example dataset below (real dataset is 400 x 500,000).
ID1 ID2

. 12abc

165498 12abc

. 12abc

798402 12abc

165498 ef4

. ef4

. ef4

I want to know, for each unique ID1, how many different ID2's are associated with it. For instance, "165498" can be found to correspond to both "12abc" and "ef4". On the other hand, "798402" can only be found to correspond to "12abc". Dots represent missing values which I have in my dataset.

How can I find out if there are any situations where a certain ID1 is associated with more than one unique ID2? And can I then tag those particular ID1s to investigate further?

Many thanks in advance

Last edited by Markos Valsamis; 27 Jan 2022, 11:42.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35211
#2

27 Jan 2022, 11:49

See a thread that started earlier today, including #5 https://www.statalist.org/forums/for...ariables-match
1 like
Comment

Announcement