
No announcement yet.
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to find number of unique values in one variables for a given value in another

    I am having trouble working out how to deduce an association between two identifier variables. I will explain this with an example dataset below (real dataset is 400 x 500,000).
    ID1 ID2
    . 12abc
    165498 12abc
    . 12abc
    798402 12abc
    165498 ef4
    . ef4
    . ef4
    I want to know, for each unique ID1, how many different ID2's are associated with it. For instance, "165498" can be found to correspond to both "12abc" and "ef4". On the other hand, "798402" can only be found to correspond to "12abc". Dots represent missing values which I have in my dataset.

    How can I find out if there are any situations where a certain ID1 is associated with more than one unique ID2? And can I then tag those particular ID1s to investigate further?

    Many thanks in advance
    Last edited by Markos Valsamis; 27 Jan 2022, 11:42.

  • #2
    See a thread that started earlier today, including #5

