Creating unique group ID variable

Gobinda Natak

Join Date: Sep 2016

Posts: 79
#1

Creating unique group ID variable

01 Mar 2020, 10:47

Dear all

I have a hopefully trivial question that I can't get my head around right now. My data set is clustered and consists of neighborhoods, households, and household members:

Code:

clear input neighID hhID hhmemberID 1 1 1 1 1 2 1 2 1 1 2 2 1 2 3 2 1 1 2 1 2 2 1 3 2 2 1 2 3 1 end

I.e. hhmemberID is only unique within hhID, hhID is only unique within neighID. How do I get hhID to be unique overall? I.e., like this:

Code:

clear input neighID hhID hhmemberID 1 1 1 1 1 2 1 2 1 1 2 2 1 2 3 2 3 1 2 3 2 2 3 3 2 4 1 2 5 1 end

One option would be to convert neighID and hhID to strings and to concatenate them (or whatever the correct term for banging them into one string variable is), but I wonder whether there is a less haphazard option?

Cheers
Go
Tags: None

Clyde Schechter

Join Date: Apr 2014
Posts: 30100

01 Mar 2020, 12:19

Code:

egen unique_hhID = group(neighID hhID), label
egen unique_hhmemberID = group(neighID hhid hhmemberID), label

Comment

Gobinda Natak

Join Date: Sep 2016

Posts: 79
#3

01 Mar 2020, 14:32

Amazing, Clyde, thanks. I did think of group() myself, but then thought that couldn't be the solution, because was I not trying to ungroup something? Thanks again, you helped me a lot.
Comment
Gobinda Natak

Join Date: Sep 2016

Posts: 79
#4

01 Mar 2020, 16:48

Clyde, after waiting 95 minutes, egen group tells me:

Code:

too many values r(134);

My data set has 1.2M hhmemberID, probably a few 100K hhID, and a few dozen neighID. Is there any way out of this? (Reading the error message, I also feel that my initial idea of concatenating two strings and then encoding them won't work.)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30100
#5

01 Mar 2020, 16:57

Code:

by neighID hhID, sort: gen unique_hhid = 1 if _n == 1 replace unique_hhid = sum(unique_hhid)

The drawback to this approach is that the unique_hhid will be a sequential number from 1 to however many distinct household id's there are, and it will not be labeled to show the values of the original neighID and hhID variables. But given the large numbers of values involved here, I don't think there is any easy way around it.

If having the unique hhid show its origins in neighID and hhID is crucial for you, then your concatenation idea will work:

Code:

egen unique_hhid = concat(neighID hhID), punct(#)

But the drawback to this approach is that you will not be able to use thie unique_hhid in many situations where a numeric variable is required.
Comment

Announcement

Creating unique group ID variable

Comment

Comment

Comment

Comment